ticket_triage · workflow

incident.io builds Workbench, an internal AI evaluation suite for their incident investigation agent

As incident.io moved from tightly focused first-generation AI features to a complex AI agent for incident investigation, triage, and resolution, their existing lightweight tooling was insufficient — it lacked eval suites, graders, and scorecards needed to ensure quality at that scale.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · @incident interaction trigger

Someone interacts via @incident, initiating LLM prompts to classify and score the interaction.

Tools used

WorkbenchLLMGrafanaSonnet 3.7

Outcome

incident.io built Workbench, a bespoke internal AI evaluation suite that enabled rapid iteration, a single pane of glass for debugging LLM interactions, and privacy-preserving performance analysis of their Investigations agent without exposing customer data to staff.

What failed first

Off-the-shelf AI tooling options existed but were rejected because relying on vendor marketing rather than first-hand experience risked adopting a product built for a different team context, which would have caused the team to skip learning AI engineering from first principles.

Results

Time savedabout 2s

Source

https://incident.io/building-with-ai/built-our-own-ai-tooling

How we source this →

Grounding & classification

Source type: technical build writeup

28 fields verified against source quotes.

agentic workflowai agentsummarizationchat transcriptcode diff prfailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwarecycle time reductionemployee productivitytime savedtechnical build writeupit supportticket triageagentic task executionhuman review queue