quality_assurance · saas · workflow
Trunk engineering lessons for building reliable AI agents for CI root cause analysis
LLM nondeterminism makes it difficult to build reliable AI agents for DevOps/CI tasks—the same inputs can produce different outputs, making testing and consistent user experience challenging.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · CI failure triggers RCA
The workflow begins with root cause analysis (RCA) for test failures in CI.
Tools used
LangSmithMSWVercel's AI SDKClaudeGemini
Outcome
Trunk built an AI agent for CI root cause analysis that produces better output and more reliable tests, enabling incremental improvements and described as a massive speed boost for previously manual tasks.
What failed first
Extensive prompt engineering to make Claude call tools in a deterministic manner failed; switching to Gemini resolved the issue at the cost of some LLM reasoning quality.
Grounding & classification
Source type: technical build writeup
26 fields verified against source quotes.
agentic workflowdata extractionsummarizationcode diff prbuilder submittedfailure mode describednamed customerproduction runtime claimedtools describedworkflow describedsoftwareemployee productivitytime savedtechnical build writeupincident managementquality assurancecase to summarymonitor detect alert