customer_support · ecommerce · workflow
Instacart builds LACE, an LLM-based automated evaluation framework for its customer support chatbot
Instacart needed a reliable, scalable way to evaluate whether its AI-powered customer support chatbot was actually helping customers in real conversations, because human evaluation alone could not scale and the chatbot's quality was difficult to measure objectively.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Chat session submitted to LACE
Each full multi-turn conversation between a customer and the support chatbot is submitted to LACE for evaluation.
Tools used
o1-preview
Outcome
LACE provides automated evaluation closely aligned with human judgment, achieving over 90% accuracy on context-dependent criteria and enabling continuous chatbot improvement through dashboard-driven feedback loops that reduced inefficient interactions.
Results
Volumeover 90%
Grounding & classification
Source type: technical build writeup
22 fields verified against source quotes.
agentic workflowchatbotconversational aimulti agent workflowchat transcripthuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedecommerceaccuracy improvementcustomer satisfactiontechnical build writeupcustomer supportquality assuranceai draft human approvalmonitor detect alert