DoorDash builds LLM guardrail system to automate restaurant menu transcription from photos
DoorDash previously relied on humans to manually transcribe restaurant menus from photos, a process described as costly and time-consuming. LLMs alone could not achieve the required high accuracy due to diverse menu structures, incomplete menus, and low-quality photos.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Menu photo submitted
Restaurant partners submit menu photos to initiate the transcription workflow.
Tools used
OCRLightGBMResNetDiTCNN
Outcome
DoorDash deployed a partial automation pipeline combining LLM transcription with an ML guardrail model that routes high-confidence transcriptions to production automatically and low-confidence ones to human review, improving efficiency without sacrificing quality and enabling rapid adoption of new AI models.
What failed first
LLMs used as standalone transcription tools produced errors due to inconsistent menu structures, incomplete menus, and low photo quality. Intensive efforts to improve LLM accuracy still required too much time and investment to meet production standards.