quality_assurance · saas · workflow

Improving Taralli food calorie tracking accuracy from 17% to 76% with DSPy evals

The initial zero-shot calorie tracking system produced wildly inaccurate outputs — including tens of thousands of kcal for common foods — because the model misinterpreted quantity fields, resulting in only 17% accuracy.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · User submits food description

A user's food description string is submitted as the input to the food tracking system.

Tools used

gpt-4o-miniDSPyW&B WeavePydanticFastAPIGemini 2.5 Flasho3Gemini 2.5 Proopenrouter

Outcome

After applying DSPy's BootstrapFewShotWithRandomSearch optimization with Gemini 2.5 Flash, tracking accuracy improved from 17% to 76%, and the optimized model was integrated into the production app for all users.

What failed first

The naive zero-shot GPT-4o mini approach with structured outputs achieved only 17% accuracy, with common failure modes including wrong calorie totals and missing food groups.

Results

Volume17.24%

Source

https://duarteocarmo.com/blog/evals-are-all-you-need

How we source this →

Grounding & classification

Source type: technical build writeup

26 fields verified against source quotes.

data extractionform submissionbuilder submittedfailure mode describedmetric backedproduction runtime claimedtools describedworkflow describedsoftwareaccuracy improvementtechnical build writeupquality assuranceextract classify routehuman review queue