quality_assurance · saas · workflow

Eval-driven development: Vercel's approach to building AI products like v0

Traditional software testing methods do not work for AI's probabilistic, non-deterministic outputs, and existing eval management approaches are ad-hoc, unscalable, and lack the specificity needed to guide targeted improvement.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · PR triggers eval suite

Every GitHub pull request that impacts the output pipeline includes eval results.

Tools used

BraintrustAI SDKv0

Outcome

Vercel's v0 AI product uses eval-driven development to catch errors early, speed up iteration, and maintain a 100% pass rate on refusal and safety evaluations, with prompts iterated on almost daily.

Results

Volume100%

Cost replaced1.5x to 2x more than code-based grading

Source

https://vercel.com/blog/eval-driven-development-build-better-ai-faster?utm_source=chatgpt.com

How we source this →

Grounding & classification

Source type: technical build writeup

20 fields verified against source quotes.

code generationragcode diff prmetric backednamed customerproduction runtime claimedtools describedvendor confirmedworkflow describedsoftwareaccuracy improvementerror reductiontechnical build writeupquality assuranceai draft human approval