quality_assurance · workflow

Zalando builds LLM-as-a-judge search quality assurance framework for multi-market launches

Zalando's pre-launch search quality assurance relied entirely on human experts manually sampling and translating queries, annotating errors, and diagnosing root causes. The process was not scalable and was reactive by nature — issues were only caught after launch when real-user signals such as CTR existed. For entirely new markets, those signals did not exist at all.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Historical query clustering by NER

Past search queries from existing markets are processed by a Named Entity Recognition (NER) engine to extract attributes and cluster queries by semantic intent.

Tools used

GPT-4oApache AirflowKubernetesNakadiElasticacheNER

Outcome

The LLM-as-a-judge evaluation framework identified multiple NER and search quality issues in Portuguese and Greek markets before go-live, enabling engineers to fix them pre-launch. A full run covers 1,500 search segments with 25 results each, completes in 3-5 hours, and costs around 250 USD — compared to days of human evaluation.

Results

Time saved3-5 hours

Volume1,500

Cost replacedaround 250 USD

Running since2025

Source

https://engineering.zalando.com/posts/2026/03/search-quality-assurance-with-llm-judge.html

How we source this →

Grounding & classification

Source type: technical build writeup

32 fields verified against source quotes.

data extractionquality inspectiontranslationproduct catalogfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedecommerceaccuracy improvementcost reductionerror reductiontime savedtechnical build writeupecommerce opsquality assuranceextract classify routemonitor detect alert