DoorDash uses LLMs to bridge behavioral silos in multi-vertical recommendations
As DoorDash expands into more verticals, most customers have deep behavioral history in only a few categories — especially restaurants — leaving them effectively cold-start in grocery, retail, and convenience. Standard recommenders have little per-SKU signal, and popularity baselines overexpose head products while pushing aside long-tail items, weakening personalization across large, sparse catalogs.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User behavior data as input
Unstructured user behavior — restaurant orders and search queries — serves as input to the LLM feature pipeline.
Tools used
LLMGPT 4oGPT 4o-miniH-RAG
Outcome
The LLM-powered framework achieved a 4.4% relative improvement in AUC-ROC and 4.8% improvement in MRR offline, confirmed with +4.3% AUC-ROC and +3.2% MRR gains in online production, while cutting total computation costs by ~80%.
What failed first
Before prompt refinements, the LLM assigned overly generic and incorrect category tags — a user who ordered Indian food was tagged with categories like 'Sandwiches' rather than relevant fine-grained categories like 'Specialty Breads (Naan)'.