quality_assurance · ecommerce · workflow

Building the LLM Platform at Whatnot: Velocity, Trust, and Reliability

Whatnot needed an LLM platform capable of supporting real product and operational workflows where inputs are harder to constrain, outputs are non-deterministic, and the system is easier for users to push in unintended directions — requiring teams to iterate fast and trust results.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Self-serve prompt experimentation

Anyone with a better idea for system behavior can test it directly in the platform without writing code or waiting on a deployment.

Tools used

Python

Outcome

Whatnot built an LLM platform enabling prompt iteration 10x+ faster, trust reviewers processing harassment reports in minutes instead of hours, and support agents resolving buyer issues on the first try.

What failed first

Standard A/B frameworks diluted prompt experiment signal by counting all exposures regardless of whether outputs differed, and brittle rules, similarity metrics, and manual spot-checking failed to scale or capture what 'good' actually meant.

Results

Time savedminutes instead of hours

Volume10x+ faster

Source

https://medium.com/whatnot-engineering/the-model-is-the-easy-part-building-the-llm-platform-at-whatnot-ec8730fa9bdf

How we source this →

Grounding & classification

Source type: technical build writeup

22 fields verified against source quotes.

agent assistagentic workflowsupport ticketbuilder submittedfailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedecommercecycle time reductionemployee productivityresolution time reductiontechnical build writeupcustomer supportquality assuranceagentic task executionhuman review queue