it_support · workflow

incident.io reduces chatbot latency by 50% with speculative tool calling

incident.io's incident-management chatbot suffered multi-second delays because LLM tool calls require sequential round-trips — the model decides to call a tool, the tool runs, the result returns, and only then does the model continue — producing nearly 5 seconds of wait time for a simple 'pause incident' action.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · User sends incident request

A user types a natural language command to the chatbot, such as asking it to pause an incident.

Tools used

LLMGo

Outcome

Speculative tool calling saves 2–3 seconds per interaction and reduces latency by about 50% for most users, making the chatbot feel responsive rather than sluggish.

What failed first

Prompt optimization alone was insufficient to meet latency targets; all available LLM-level and database-level performance improvements had already been applied before the speculative approach was developed.

Results

Time saved2-3s

Volumeabout 50%

Source

https://incident.io/building-with-ai/speculative-tool-calling

How we source this →

Grounding & classification

Source type: technical build writeup

21 fields verified against source quotes.

agentic workflowchatbotconversational aichat transcriptfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwarecycle time reductionresponse time reductiontechnical build writeupit supportagentic task execution