marketing_ops · media · workflow

Inside the Archive: How Spotify generated 1.4 billion personalized LLM reports for 2025 Wrapped

Spotify wanted to identify meaningful listening moments from each user's 2025 history and generate a personalized narrative story for them, at a scale of roughly 350 million eligible users each receiving up to five reports, totaling approximately 1.4 billion LLM-generated reports — a volume that made high-performance models economically infeasible.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Heuristic remarkable day identification

A priority-ordered set of heuristics ranks each user's listening events to identify up to five standout days per year.

Tools used

fine-tuned modelfrontier modelLLMDirect Preference Optimization (DPO)pubsub message queueevaluation data warehousedistributed, column-oriented key-value databaseAI coding assistants

Outcome

Wrapped Archive reached hundreds of millions of users globally; the generation engine ran for four days straight producing roughly 1.4 billion reports, and the structured evaluation and remediation loop caught the timezone bug, fixed the pipeline, and allowed bulk replay of affected reports.

What failed first

A timezone bug in the upstream data pipeline caused some Biggest Discovery Day reports to celebrate the wrong number of artists discovered; the LLM faithfully generated a compelling but factually incorrect story from the flawed data.

Results

Time savedfour days

Volumeroughly 1.4 billion reports

Source

https://engineering.atspotify.com/2026/3/inside-the-archive-2025-wrapped

How we source this →

Grounding & classification

Source type: technical build writeup

34 fields verified against source quotes.

content generationpersonalizationsummarizationknowledge basefailure mode describedhuman review describedmetric backednamed customerproduction verifiedsource backedtools describedworkflow describedmediasoftwareautomation ratethroughput increasetechnical build writeupmarketing opsquality assurancecase to summaryextract classify route