quality_assurance · saas · workflow

Anaconda's Evaluations Driven Development improves Anaconda Assistant error-handling accuracy from 0–13% to 87–100%

Data scientists frequently encountered code errors without clear, reliable guidance, and the Anaconda Assistant's underlying language models correctly identified and fixed errors in at most 13% of test cases before prompt engineering.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User describes error to Assistant
A data scientist describes their error to the Anaconda Assistant.
Tools used
Anaconda Assistantllm-evalGPT-3.5-TurboMistral 7B Instruct v0.2
Outcome

After applying few-shot learning, chain-of-thought prompting, and Agentic Feedback Iteration, success rates rose to 87% for GPT-3.5-Turbo at temperature 0 and 100% for Mistral 7B at temperature 1.

What failed first

Initial evaluations found success rates as low as 0% for Mistral 7B and 12% for GPT-3.5-Turbo when diagnosing and fixing Python errors, before any prompt engineering was applied.

Results
Volume60%
Source

https://www.anaconda.com/blog/introducing-evaluations-driven-development?utm_source=chatgpt.com

How we source this →

Grounding & classification
Source type: technical build writeup
25 fields verified against source quotes.
agentic workflowcode generationcontent generationfailure mode describedmetric backedproduction runtime claimedtools describedworkflow describedsoftwareaccuracy improvementemployee productivitytechnical build writeupquality assurance