quality_assurance · saas · workflow

Neon improves MCP server tool-call eval pass rate from 60% to 100% with prompt tuning

LLMs struggle to select the correct tool from a large list and could misuse a generic SQL-execution tool instead of the required two-step stateful migration workflow, making correctness hard to guarantee without automated testing.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Eval task submitted
A database migration task is submitted as the eval input to the LLM under test.
Tools used
MCP serverBraintrustClaude
Outcome

By tweaking tool description prompts without any code changes, the eval pass rate rose from 60% to 100%.

What failed first

With the initial basic tool descriptions, the eval pass rate was only around 60%.

Results
Volume60%
Source

https://neon.com/blog/test-evals-for-mcp

How we source this →

Grounding & classification
Source type: technical build writeup
17 fields verified against source quotes.
agentic workflowai agentcode diff prfailure mode describedmetric backedproduction runtime claimedtools describedworkflow describedsoftwareaccuracy improvementtechnical build writeupquality assuranceagentic task execution