Cloudflare builds CI-native multi-agent AI code review system across 48,095 merge requests
Code review was reliably bottlenecking Cloudflare's engineering teams, with a median wait time for a first review measured in hours. Off-the-shelf AI code review tools lacked the flexibility and customisation required at Cloudflare's scale, and a naive approach of stuffing diffs into a large language model produced a flood of vague, noisy output.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Merge request opens review
When an engineer at Cloudflare opens a merge request, it gets an initial pass from a coordinated set of AI agents.
In its first month the system completed 131,246 review runs across 48,095 merge requests in 5,169 repositories, with a median review time of 3 minutes and 39 seconds, an average cost of $1.19, an 85.7% prompt cache hit rate, and engineers needing to break glass on only 0.6% of merge requests.
What failed first
Commercial AI code review tools were insufficiently configurable for a large engineering organization. A naive single-prompt LLM approach of grabbing a git diff and asking a model to find bugs produced a flood of vague suggestions, hallucinated syntax errors, and redundant advice.