incident_management · workflow

Mercari builds IBIS, an LLM-powered SRE incident handling buddy using RAG over historical incident reports

Mercari's on-call SRE team was burdened by frequent alerts escalating into incidents, increasing MTTR and reducing time available for feature development.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Scheduled incident data export
Google Cloud Scheduler regularly exports incident reports from Blameless's external API into a Google Cloud Storage bucket.
Tools used
Blameless · partnerGoogle Cloud SchedulerGoogle Cloud StorageGoogle Cloud Run JobsGoogle Cloud Run FunctionsGoogle Cloud WorkflowEventarcLangChainSpaCyGPT-4oOpenAIBigQuerySlack · partner
Outcome

IBIS was deployed into several key incident-handling Slack channels at Mercari by end of December 2024, with user adoption continuing to grow while MTTR impact is being monitored.

Results
Time savedreducing the Mean Time to Recovery (MTTR)
Cost replacedreducing on-call handling costs
Running sinceend of December 2024
Source

https://engineering.mercari.com/en/blog/entry/20250206-llm-sre-incident-handling-buddy/

How we source this →

Grounding & classification
Source type: technical build writeup
33 fields verified against source quotes.
conversational aiknowledge searchragsummarizationtranslationchat transcriptknowledge basenamed customerproduction runtime claimedtools describedworkflow describedecommercecycle time reductionemployee productivitytechnical build writeupincident managementit supportrag answering