back_office_ops · workflow

Nextdoor's path from pre-trained to fine-tuned embedding models for notifications, feed, and search ranking

Nextdoor needed richer content representations to capture nuanced user signals and improve personalization across products, while managing the high storage and serving costs of large fixed-dimensionality embeddings updated daily at scale.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Content text extraction

Text from Nextdoor posts and comments is extracted from each post's subject and body and from comment text.

Tools used

Sentence-BERTSBERTpytorchSageMakerFeatureStoreAirflowHSNWlibBERTopicCLIP

Outcome

Fine-tuned embedding models delivered significant performance lifts in OKR metrics for notifications and feed, reduced null query rates significantly, improved query expansion latencies by more than 10x, and improved user-post cosine similarity by up to 16% while reducing embedding dimensionality by more than 10x.

What failed first

Pre-trained off-the-shelf models were trained on public benchmark datasets with semantics different from the Nextdoor domain, and their high fixed dimensionality caused significant storage and serving costs. Earlier word embedding models produced higher rates of null search queries.

Results

Volumemore than 10x

Running sinceearly 2022

Source

https://engblog.nextdoor.com/from-pre-trained-to-fine-tuned-nextdoors-path-to-effective-embedding-applications-3a13b56d91aa

How we source this →

Grounding & classification

Source type: technical build writeup

36 fields verified against source quotes.

enterprise searchpersonalizationpredictive analyticsrecommendation systemknowledge basesocial media postbuilder submittedmetric backednamed customerproduction runtime claimedsource backedtools describedworkflow describedsoftwareaccuracy improvementemployee productivityresponse time reductiontechnical build writeupback office opsdata sync enrichmentextract classify route