incident_management · workflow

incident.io reduces Investigations agent LLM latency 4x through prompt format optimization

incident.io's Investigations agent LLM prompt calls were slow, taking up to 11 seconds to respond, driven by verbose JSON output with reasoning fields and uncompressed Grafana dashboard definitions that inflated input tokens to about 15k.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Incident alert received

When an incident is declared, the system receives the incoming alert.

Tools used

GrafanaGo

Outcome

Through three sequential optimizations — removing reasoning fields, compressing input format, and compressing output format — the Investigations agent prompt went from 11 seconds to reliably under 2.3 seconds, a 4x improvement overall.

What failed first

The initial prompt included reasoning fields that inflated output tokens to 315 and represented Grafana dashboards as verbose JSON, inflating input tokens to about 15k — together driving latency to 11 seconds per call.

Results

Time savedreliably <2.3s

Volume40%

Source

https://incident.io/building-with-ai/optimizing-llm-prompts

How we source this →

Grounding & classification

Source type: technical build writeup

23 fields verified against source quotes.

agentic workflowfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwarecost reductioncycle time reductiontechnical build writeupincident managementit supportagentic task execution