customer_support · workflow

Assembled empirically compares LLM evaluation metrics for Assembled Assist, its AI customer support agent tool

As Assembled Assist grew, the team needed scalable automated methods to evaluate response quality, prevent regressions when updating the AI system, and measure improvements across hundreds of evaluation cases without relying solely on manual scoring.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Conversation context arrives

Given some context of a conversation, the assistant is invoked.

Tools used

Assembled Assist

Outcome

LLM-based and embedding evaluations outperformed n-gram-based metrics, with Assembled's custom LLM Eval dominating quantitative approaches. Automating evaluations is projected to save thousands of hours.

Results

Time savedthousands of hours

Volume40

Source

https://www.assembled.com/blog/ai-at-assembled-an-empirical-comparison-of-llm-evaluation-metrics-in-a-customer-support-setting

How we source this →

Grounding & classification

Source type: technical build writeup

17 fields verified against source quotes, 4 dropped as unverifiable.

agent assistcontent generationragchat transcriptknowledge basehuman review describedmetric backednamed customerproduction runtime claimedworkflow describedsoftwareaccuracy improvementtime savedtechnical build writeupcustomer supportai draft human approval