customer_support · workflow

Assembled empirically compares LLM evaluation metrics for Assembled Assist, its AI customer support agent tool

As Assembled Assist grew, the team needed scalable automated methods to evaluate response quality, prevent regressions when updating the AI system, and measure improvements across hundreds of evaluation cases without relying solely on manual scoring.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Conversation context arrives
Given some context of a conversation, the assistant is invoked.
Tools used
Assembled Assist
Outcome

LLM-based and embedding evaluations outperformed n-gram-based metrics, with Assembled's custom LLM Eval dominating quantitative approaches. Automating evaluations is projected to save thousands of hours.

Results
Time savedthousands of hours
Volume40
Source

https://www.assembled.com/blog/ai-at-assembled-an-empirical-comparison-of-llm-evaluation-metrics-in-a-customer-support-setting

How we source this →

Grounding & classification
Source type: technical build writeup
17 fields verified against source quotes, 4 dropped as unverifiable.
agent assistcontent generationragchat transcriptknowledge basehuman review describedmetric backednamed customerproduction runtime claimedworkflow describedsoftwareaccuracy improvementtime savedtechnical build writeupcustomer supportai draft human approval