Pinterest improves search relevance using LLM-based teacher-student distillation pipeline
Pinterest Search relied on engagement signals for ranking, but needed a genuine relevance model to ensure displayed content was pertinent to user queries rather than driven by past behaviour. The system also lacked coverage for multilingual queries and seasonal new concepts not found in limited human-annotated data.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User search query received
Users submit queries on Pinterest Search to discover content that aligns with their information needs.
The LLM-based relevance pipeline achieved a +2.18% improvement in search feed relevance measured by nDCG@20, and online A/B experiments showed improvements of more than 1% in search feed relevance and more than 1.5% in search fulfillment rates. The multilingual teacher also generalised to languages not seen during training.
What failed first
The LLM-based cross-encoder teacher model was effective at predicting relevance but could not be deployed directly for real-time serving due to latency and cost constraints.