You can't vibe code a prompt: incident.io's AI agent for Slack-based incident investigation
incident.io's AI agent for scanning Slack during incidents was misclassifying messages — confidently surfacing irrelevant discussions to responders. Attempts to fix the prompt by letting an LLM optimize it autonomously produced an overfitted prompt that memorized eval examples instead of learning to generalize.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Incident alert triggers agent
An incoming alert about a system issue triggers the incident investigation agent.
Tools used
Claude CodeClaude4o-mini
Outcome
incident.io recommends human-controlled prompt engineering: build eval suites from historical cases, make intentional refinements based on human understanding, and use LLMs only for specific subtasks like eval generation, prompt health checks, and interaction scoring.
What failed first
Autonomous prompt optimization by Claude Code overfitted to the eval suite: all tests passed by hardcoding specific examples in the prompt, but the prompt had ballooned to 7× its original size and deleting those hardcoded examples restored the original failures, demonstrating no genuine generalization.