ecommerce_ops · ecommerce · workflow

Building DoorDash's Product Knowledge Graph with Large Language Models

DoorDash's SKU enrichment process was manual and led by contract operators, producing long turnaround times, high costs, and so many inaccuracies that a second human had to audit results. Brand ingestion was reactive and purely manual, limiting volume, failing to close coverage gaps, and generating duplicate brands. Building an in-house extraction model was blocked by the NLP cold-start problem requiring large labeled datasets.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Merchant SKU data ingested
When a merchant comes onboard at DoorDash, their internal SKU data is added to the retail catalog.
Tools used
GPT-4OpenAI embeddingsRAGOCR
Outcome

LLM-powered pipelines enabled proactive brand identification at scale with improved efficiency and accuracy, organic label coverage sufficient to launch personalized item carousels that improved top-line engagement metrics, and attribute annotation generation within a week that would otherwise require months to collect.

What failed first

Manual SKU enrichment by contract operators was so inaccurate it required a second human audit pass, and the cold-start problem of NLP caused data collection to slow model development and delay adding new items to the active catalog.

Results
Time savedlong turnaround times
Cost replacedhigh costs
Source

https://doordash.engineering/2024/04/23/building-doordashs-product-knowledge-graph-with-large-language-models/

How we source this →

Grounding & classification
Source type: technical build writeup
33 fields verified against source quotes.
data extractiondocument classificationknowledge searchpersonalizationragproduct catalogbuilder submittedfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedecommercelogisticsaccuracy improvementemployee productivitytime savedtechnical build writeupdata entry opsecommerce opsquality assurancedata sync enrichmentextract classify routerag answering