quality_assurance · ecommerce · workflow
Building the LLM Platform at Whatnot: Velocity, Trust, and Reliability
Whatnot needed an LLM platform capable of supporting real product and operational workflows where inputs are harder to constrain, outputs are non-deterministic, and the system is easier for users to push in unintended directions — requiring teams to iterate fast and trust results.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Self-serve prompt experimentation
Anyone with a better idea for system behavior can test it directly in the platform without writing code or waiting on a deployment.
Tools used
Python
Outcome
Whatnot built an LLM platform enabling prompt iteration 10x+ faster, trust reviewers processing harassment reports in minutes instead of hours, and support agents resolving buyer issues on the first try.
What failed first
Standard A/B frameworks diluted prompt experiment signal by counting all exposures regardless of whether outputs differed, and brittle rules, similarity metrics, and manual spot-checking failed to scale or capture what 'good' actually meant.
Results
Time savedminutes instead of hours
Volume10x+ faster
Grounding & classification
Source type: technical build writeup
22 fields verified against source quotes.
agent assistagentic workflowsupport ticketbuilder submittedfailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedecommercecycle time reductionemployee productivityresolution time reductiontechnical build writeupcustomer supportquality assuranceagentic task executionhuman review queue