Workflow · workflow

Pinterest scales recommendation system serving throughput 7x with request-level deduplication

Scaling Pinterest's recommendation models 100x created massive infrastructure pressure: storage, training, and serving costs threatened to grow proportionally, requiring deliberate efficiency techniques to remain economically viable.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User opens feed
A request is triggered when a user opens their feed, kicking off the recommendation funnel.
Tools used
Apache IcebergTritonFlashAttention
Outcome

Request-level deduplication delivered 10–50x storage compression, a 4x retrieval training speedup, a 2.8x ranking training speedup, and a 7x serving throughput increase, enabling deployment of a 100x larger Foundation Model without proportional cost increases.

What failed first

Early experiments with request-sorted data caused 1–2% regressions in offline evaluation metrics due to IID disruption in Batch Normalization, and a false negative rate that jumped to as high as 30% in retrieval training.

Results
Volume100x
Running since2025
Source

https://medium.com/pinterest-engineering/scaling-recommendation-systems-with-request-level-deduplication-93bd514142d9

How we source this →

Grounding & classification
Source type: technical build writeup
29 fields verified against source quotes.
personalizationrecommendation systemfailure mode describedmetric backednamed customerproduction runtime claimedsource backedtools describedworkflow describedmediasoftwarecost reductionemployee productivitythroughput increasetime savedtechnical build writeup