quality_assurance · saas · workflow

Trunk engineering lessons for building reliable AI agents for CI root cause analysis

LLM nondeterminism makes it difficult to build reliable AI agents for DevOps/CI tasks—the same inputs can produce different outputs, making testing and consistent user experience challenging.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · CI failure triggers RCA

The workflow begins with root cause analysis (RCA) for test failures in CI.

Tools used

LangSmithMSWVercel's AI SDKClaudeGemini

Outcome

Trunk built an AI agent for CI root cause analysis that produces better output and more reliable tests, enabling incremental improvements and described as a massive speed boost for previously manual tasks.

What failed first

Extensive prompt engineering to make Claude call tools in a deterministic manner failed; switching to Gemini resolved the issue at the cost of some LLM reasoning quality.

Source

https://trunk.io/blog/attempting-to-engineer-the-chaos-out-of-ai-agents

How we source this →

Grounding & classification

Source type: technical build writeup

26 fields verified against source quotes.

agentic workflowdata extractionsummarizationcode diff prbuilder submittedfailure mode describednamed customerproduction runtime claimedtools describedworkflow describedsoftwareemployee productivitytime savedtechnical build writeupincident managementquality assurancecase to summarymonitor detect alert