ticket_triage · workflow

incident.io builds Workbench, an internal AI evaluation suite for their incident investigation agent

As incident.io moved from tightly focused first-generation AI features to a complex AI agent for incident investigation, triage, and resolution, their existing lightweight tooling was insufficient — it lacked eval suites, graders, and scorecards needed to ensure quality at that scale.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · @incident interaction trigger
Someone interacts via @incident, initiating LLM prompts to classify and score the interaction.
Tools used
WorkbenchLLMGrafanaSonnet 3.7
Outcome

incident.io built Workbench, a bespoke internal AI evaluation suite that enabled rapid iteration, a single pane of glass for debugging LLM interactions, and privacy-preserving performance analysis of their Investigations agent without exposing customer data to staff.

What failed first

Off-the-shelf AI tooling options existed but were rejected because relying on vendor marketing rather than first-hand experience risked adopting a product built for a different team context, which would have caused the team to skip learning AI engineering from first principles.

Results
Time savedabout 2s
Source

https://incident.io/building-with-ai/built-our-own-ai-tooling

How we source this →

Grounding & classification
Source type: technical build writeup
28 fields verified against source quotes.
agentic workflowai agentsummarizationchat transcriptcode diff prfailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwarecycle time reductionemployee productivitytime savedtechnical build writeupit supportticket triageagentic task executionhuman review queue