quality_assurance · education · workflow
Coursera builds a structured AI evaluation framework with Braintrust to ship reliable features
Coursera lacked a formal evaluation framework for AI features, relying instead on fragmented offline jobs in spreadsheets, siloed scripts per team, and manual data reviews, which made it difficult to quickly validate AI features and push them to production with confidence.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Define success criteria
Before development begins, teams establish exactly what 'good enough' looks like and identify specific output characteristics that matter most to users and business goals.
Tools used
BraintrustLLMs
Outcome
Coursera's structured evaluation framework transformed their AI development process, enabling objective validation, faster iteration, and a common quality language across teams. Coursera Coach achieved a 90% learner satisfaction rating, and automated grading delivers grades within 1 minute of submission with approximately 45× more feedback, driving a 16.7% increase in course completions.
Results
Time savedwithin 1 minute
Volume90%
Grounding & classification
Source type: vendor customer story
28 fields verified against source quotes.
chatbotcontent generationquality inspectionchat transcriptform submissionfailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describededucationconversion increasecustomer satisfactioncycle time reductionemployee productivitythroughput increasevendor customer storyback office opsquality assuranceai draft human approvalmonitor detect alert