quality_assurance · saas · workflow

Improving Taralli food calorie tracking accuracy from 17% to 76% with DSPy evals

The initial zero-shot calorie tracking system produced wildly inaccurate outputs — including tens of thousands of kcal for common foods — because the model misinterpreted quantity fields, resulting in only 17% accuracy.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User submits food description
A user's food description string is submitted as the input to the food tracking system.
Tools used
gpt-4o-miniDSPyW&B WeavePydanticFastAPIGemini 2.5 Flasho3Gemini 2.5 Proopenrouter
Outcome

After applying DSPy's BootstrapFewShotWithRandomSearch optimization with Gemini 2.5 Flash, tracking accuracy improved from 17% to 76%, and the optimized model was integrated into the production app for all users.

What failed first

The naive zero-shot GPT-4o mini approach with structured outputs achieved only 17% accuracy, with common failure modes including wrong calorie totals and missing food groups.

Results
Volume17.24%
Source

https://duarteocarmo.com/blog/evals-are-all-you-need

How we source this →

Grounding & classification
Source type: technical build writeup
26 fields verified against source quotes.
data extractionform submissionbuilder submittedfailure mode describedmetric backedproduction runtime claimedtools describedworkflow describedsoftwareaccuracy improvementtechnical build writeupquality assuranceextract classify routehuman review queue