quality_assurance · saas · workflow

uReview: Uber's multi-stage GenAI platform autonomously reviews over 90% of 65,000 weekly code diffs, saving approximately 1,500 developer hours per week

As code volume grew—amplified by AI-assisted development—Uber's reviewers became overloaded and struggled to catch subtle bugs, security vulnerabilities, and best-practice violations consistently, leading to missed errors, slower feedback loops, production incidents, and slower release cycles.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Developer submits change
When a developer submits a change on Uber's code review platform, uReview begins automated review.
Tools used
Claude-4-Sonneto4-mini-highApache HiveApache KafkaPhabricator · partner
Outcome

uReview analyzes over 90% of Uber's weekly ~65,000 diffs, with 75% of its comments rated as useful and over 65% addressed in the same changeset—outperforming human reviewers whose comments are addressed only 51% of the time—saving approximately 1,500 developer hours weekly, equivalent to nearly 39 developer years annually.

What failed first

Third-party AI code review tools required GitHub hosting, but Uber uses Phabricator; those tools also produced many false positives and low-value true positives, and could not integrate with Uber's internal systems. Simple standalone LLM prompts generated too many false-positive comments that eroded developer trust.

Results
Time savedover 90%
Volume~65,000
Cost replacedorder of magnitude less
Source

https://www.uber.com/en-GB/blog/ureview/?uclick_id=0a73d271-32e7-4b77-9697-a587a4c8d9fe

How we source this →

Grounding & classification
Source type: technical build writeup
40 fields verified against source quotes.
agentic workflowcode generationmulti agent workflowquality inspectioncode diff prfailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedsource backedtools describedworkflow describedsoftwareaccuracy improvementcost reductionemployee productivityerror reductiontime savedtechnical build writeupquality assuranceai draft human approvalextract classify route