quality_assurance · saas · workflow

Anaconda's Evaluations Driven Development improves Anaconda Assistant error-handling accuracy from 0–13% to 87–100%

Data scientists frequently encountered code errors without clear, reliable guidance, and the Anaconda Assistant's underlying language models correctly identified and fixed errors in at most 13% of test cases before prompt engineering.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · User describes error to Assistant

A data scientist describes their error to the Anaconda Assistant.

Tools used

Anaconda Assistantllm-evalGPT-3.5-TurboMistral 7B Instruct v0.2

Outcome

After applying few-shot learning, chain-of-thought prompting, and Agentic Feedback Iteration, success rates rose to 87% for GPT-3.5-Turbo at temperature 0 and 100% for Mistral 7B at temperature 1.

What failed first

Initial evaluations found success rates as low as 0% for Mistral 7B and 12% for GPT-3.5-Turbo when diagnosing and fixing Python errors, before any prompt engineering was applied.

Results

Volume60%

Source

https://www.anaconda.com/blog/introducing-evaluations-driven-development?utm_source=chatgpt.com

How we source this →

Grounding & classification

Source type: technical build writeup

25 fields verified against source quotes.

agentic workflowcode generationcontent generationfailure mode describedmetric backedproduction runtime claimedtools describedworkflow describedsoftwareaccuracy improvementemployee productivitytechnical build writeupquality assurance