quality_assurance · saas · workflow
Eval-driven development: Vercel's approach to building AI products like v0
Traditional software testing methods do not work for AI's probabilistic, non-deterministic outputs, and existing eval management approaches are ad-hoc, unscalable, and lack the specificity needed to guide targeted improvement.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · PR triggers eval suite
Every GitHub pull request that impacts the output pipeline includes eval results.
Tools used
BraintrustAI SDKv0
Outcome
Vercel's v0 AI product uses eval-driven development to catch errors early, speed up iteration, and maintain a 100% pass rate on refusal and safety evaluations, with prompts iterated on almost daily.
Results
Volume100%
Cost replaced1.5x to 2x more than code-based grading
Grounding & classification
Source type: technical build writeup
20 fields verified against source quotes.
code generationragcode diff prmetric backednamed customerproduction runtime claimedtools describedvendor confirmedworkflow describedsoftwareaccuracy improvementerror reductiontechnical build writeupquality assuranceai draft human approval