customer_support · workflow
DoorDash builds simulation and evaluation flywheel to develop LLM support chatbots at scale
LLMs' non-determinism made safe testing of support chatbot changes impossible: deploying to production risked degrading customer and Dasher experience, while manual testing was too slow and likely to miss problems.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Identify customer problem
Human review of cases from simulation runs or live user traffic identifies issues to address and real transcripts to seed the simulator.
Tools used
LLMsS3gRPC
Outcome
The flywheel reduced hallucinations by 90% in simulation with the improvement carrying over into production, cut each iteration cycle from days to hours, and enabled more than 200 simulated conversations to run in under five minutes.
What failed first
The early LLM implementation suffered from hallucinations because the context window was overwhelmed with raw events and logs, causing the model to misinterpret fields or suggest non-existent policies; iterative attempts at summarization either lost important details or remained too noisy.
Results
Time savedreduced each iteration cycle from days to hours
Volume90%
Grounding & classification
Source type: technical build writeup
27 fields verified against source quotes.
agentic workflowchatbotconversational aisummarizationchat transcriptbuilder submittedfailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedlogisticscycle time reductionerror reductionthroughput increasetechnical build writeupcustomer supportquality assuranceautonomous resolutionhuman review queue