quality_assurance · saas · workflow
Anaconda's Evaluations Driven Development improves Anaconda Assistant error-handling accuracy from 0–13% to 87–100%
Data scientists frequently encountered code errors without clear, reliable guidance, and the Anaconda Assistant's underlying language models correctly identified and fixed errors in at most 13% of test cases before prompt engineering.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User describes error to Assistant
A data scientist describes their error to the Anaconda Assistant.
Tools used
Anaconda Assistantllm-evalGPT-3.5-TurboMistral 7B Instruct v0.2
Outcome
After applying few-shot learning, chain-of-thought prompting, and Agentic Feedback Iteration, success rates rose to 87% for GPT-3.5-Turbo at temperature 0 and 100% for Mistral 7B at temperature 1.
What failed first
Initial evaluations found success rates as low as 0% for Mistral 7B and 12% for GPT-3.5-Turbo when diagnosing and fixing Python errors, before any prompt engineering was applied.
Results
Volume60%
Source
https://www.anaconda.com/blog/introducing-evaluations-driven-development?utm_source=chatgpt.com
Grounding & classification
Source type: technical build writeup
25 fields verified against source quotes.
agentic workflowcode generationcontent generationfailure mode describedmetric backedproduction runtime claimedtools describedworkflow describedsoftwareaccuracy improvementemployee productivitytechnical build writeupquality assurance