back_office_ops · workflow

Assembled builds automated LLM fallback system achieving 99.97% effective AI uptime

LLM provider outages caused multiple customer-impacting incidents at Assembled, and manual model switchovers were slow, stressful, and unable to handle partial degradations — leaving the team reactive rather than resilient.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Provider failure detected

The system detects an LLM provider becoming unavailable, triggering automated failover.

Tools used

OpenAIAnthropicGPT-4.1-MiniClaude 3.5 HaikuGemini 2.5 Flash

Outcome

Automated fallbacks achieved 99.97% effective uptime on AI model responses, reduced average failover time from 5+ minutes to hundreds of milliseconds, and eliminated manual interventions during provider outages — with request failure rates below 0.001% during a recent multi-hour outage.

What failed first

A first attempt using on-call engineers with an accessible configuration switch failed because blanket switches broke nuanced per-model routing, took too long, and could not reliably classify partial versus full outages.

Results

Time saved99.97%

Volumezero

Source

https://www.assembled.com/blog/your-llm-provider-will-go-down-but-you-dont-have-to

How we source this →

Grounding & classification

Source type: technical build writeup

27 fields verified against source quotes.

failure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwareautomation ratecycle time reductionemployee productivityerror reductiontechnical build writeupback office opsescalation workflow