Engineering LinkedIn's Next-Generation Feed with LLMs and Transformer Models
LinkedIn's Feed relied on a heterogeneous retrieval architecture with multiple separate sources—trending content, collaborative filtering, and embedding-based systems—each with its own infrastructure, creating engineering complexity and preventing holistic optimization. The traditional ranking model evaluated each impression independently, missing sequential patterns in how professionals consume content over time.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Member opens Feed
A member opening LinkedIn triggers one of the largest-scale recommendation systems in the industry.
Tools used
LLMGPUsH100PyTorchMMoEDCNv2GRMISInfoNCE
Outcome
Percentile-bucketed feature encoding improved recall@10 by 15%. Adding two hard negatives per member improved recall by a further 3.6%. Positives-only training reduced per-sequence memory footprint by 37% and enabled 2.6× faster training iteration. A custom Flash Attention variant delivered an additional 2× serving speedup, with the full system achieving sub-50ms retrieval latency across millions of posts.
What failed first
Passing raw numerical engagement counts as unprocessed text tokens resulted in near-zero correlation (-0.004) between item popularity and embedding similarity, degrading retrieval quality. Including all impressed posts—both engaged and scrolled-past—in training histories hurt model performance and inflated GPU compute costs.