incident_management · workflow

You can't vibe code a prompt: incident.io's AI agent for Slack-based incident investigation

incident.io's AI agent for scanning Slack during incidents was misclassifying messages — confidently surfacing irrelevant discussions to responders. Attempts to fix the prompt by letting an LLM optimize it autonomously produced an overfitted prompt that memorized eval examples instead of learning to generalize.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Incident alert triggers agent
An incoming alert about a system issue triggers the incident investigation agent.
Tools used
Claude CodeClaude4o-mini
Outcome

incident.io recommends human-controlled prompt engineering: build eval suites from historical cases, make intentional refinements based on human understanding, and use LLMs only for specific subtasks like eval generation, prompt health checks, and interaction scoring.

What failed first

Autonomous prompt optimization by Claude Code overfitted to the eval suite: all tests passed by hardcoding specific examples in the prompt, but the prompt had ballooned to 7× its original size and deleting those hardcoded examples restored the original failures, demonstrating no genuine generalization.

Results
Volume7× its original size
Cost replacedfour iterations
Source

https://incident.io/building-with-ai/you-cant-vibe-code-a-prompt

How we source this →

Grounding & classification
Source type: technical build writeup
23 fields verified against source quotes.
ai agentdocument classificationchat transcriptbuilder submittedfailure mode describedhuman review describedmetric backedproduction runtime claimedtools describedworkflow describedsoftwaretechnical build writeupincident managementquality assuranceextract classify routemonitor detect alert