Zalando builds AI-powered multi-stage LLM pipeline to transform two years of postmortems into actionable infrastructure insights
Zalando accumulated thousands of postmortem documents but could not extract strategic patterns at scale. Each postmortem takes 15–20 minutes to read, making company-wide retrospective analysis of years of incidents cognitively and practically impossible.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Postmortem corpus ingested
Thousands of postmortem documents are fed into the pipeline as input.
Tools used
Claude Sonnet 4AWS BedrockNotebookLMLM Studio
Outcome
The multi-stage LLM pipeline reduced postmortem analysis time from days to hours and boosted productivity three times. It surfaced hidden patterns including a finding that automated change validation could shield 25% of subsequent datastore incidents. Surface attribution error remains at approximately 10% even with the latest model, and hallucinations became negligible.
What failed first
An initial attempt using Google's NotebookLM produced severe hallucinations and lost incident context when generating summaries, reducing effective productivity rather than improving it. Small open-source models showed up to 40% hallucination probability, and a no-code agentic approach was ruled out due to performance limitations and inaccuracies.
Results
Time savedsignificantly reduced the time for analysis from days to hours