Nextdoor's path from pre-trained to fine-tuned embedding models for notifications, feed, and search ranking
Nextdoor needed richer content representations to capture nuanced user signals and improve personalization across products, while managing the high storage and serving costs of large fixed-dimensionality embeddings updated daily at scale.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Content text extraction
Text from Nextdoor posts and comments is extracted from each post's subject and body and from comment text.
Fine-tuned embedding models delivered significant performance lifts in OKR metrics for notifications and feed, reduced null query rates significantly, improved query expansion latencies by more than 10x, and improved user-post cosine similarity by up to 16% while reducing embedding dimensionality by more than 10x.
What failed first
Pre-trained off-the-shelf models were trained on public benchmark datasets with semantics different from the Nextdoor domain, and their high fixed dimensionality caused significant storage and serving costs. Earlier word embedding models produced higher rates of null search queries.