quality_assurance · saas · workflow
Neon improves MCP server tool-call eval pass rate from 60% to 100% with prompt tuning
LLMs struggle to select the correct tool from a large list and could misuse a generic SQL-execution tool instead of the required two-step stateful migration workflow, making correctness hard to guarantee without automated testing.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Eval task submitted
A database migration task is submitted as the eval input to the LLM under test.
Tools used
MCP serverBraintrustClaude
Outcome
By tweaking tool description prompts without any code changes, the eval pass rate rose from 60% to 100%.
What failed first
With the initial basic tool descriptions, the eval pass rate was only around 60%.
Results
Volume60%
Grounding & classification
Source type: technical build writeup
17 fields verified against source quotes.
agentic workflowai agentcode diff prfailure mode describedmetric backedproduction runtime claimedtools describedworkflow describedsoftwareaccuracy improvementtechnical build writeupquality assuranceagentic task execution