quality_assurance · saas · workflow

A Practical Approach to Verifying Code at Scale

As autonomous coding systems generate increasing volumes of code, thorough human review becomes impractical, raising the risk that AI-written code introduces severe bugs and vulnerabilities — whether accidentally or intentionally.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · PR triggers automated review
Every PR at OpenAI is automatically reviewed by the code review agent.
Tools used
gpt-5-codexgpt-5.1-codex-maxCodex CLICodexCriticGPT
Outcome

The agentic code reviewer is now a core part of OpenAI's engineering workflow, handling over 100k external PRs per day as of October 2025, with authors making code changes in response to 52.7% of comments and over 80% of comment reactions being positive.

What failed first

Earlier code review approaches provided only a diff with limited surrounding context, causing them to miss important codebase-wide interactions. CriticGPT was designed for simpler tasks and was not suitable for production deployment.

Results
Time savedmore than 100k
Volume36%
Running sinceOctober 2025
Source

https://alignment.openai.com/scaling-code-verification/

How we source this →

Grounding & classification
Source type: technical build writeup
32 fields verified against source quotes.
agentic workflowai agentquality inspectioncode diff prfailure mode describedmetric backednamed customerproduction runtime claimedtools describedvendor confirmedworkflow describedsoftwareautomation rateemployee productivityerror reductiontechnical build writeupquality assuranceai draft human approvalmonitor detect alert