quality_assurance · saas · workflow

A Practical Approach to Verifying Code at Scale

As autonomous coding systems generate increasing volumes of code, thorough human review becomes impractical, raising the risk that AI-written code introduces severe bugs and vulnerabilities — whether accidentally or intentionally.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · PR triggers automated review

Every PR at OpenAI is automatically reviewed by the code review agent.

Tools used

gpt-5-codexgpt-5.1-codex-maxCodex CLICodexCriticGPT

Outcome

The agentic code reviewer is now a core part of OpenAI's engineering workflow, handling over 100k external PRs per day as of October 2025, with authors making code changes in response to 52.7% of comments and over 80% of comment reactions being positive.

What failed first

Earlier code review approaches provided only a diff with limited surrounding context, causing them to miss important codebase-wide interactions. CriticGPT was designed for simpler tasks and was not suitable for production deployment.

Results

Time savedmore than 100k

Volume36%

Running sinceOctober 2025

Source

https://alignment.openai.com/scaling-code-verification/

How we source this →

Grounding & classification

Source type: technical build writeup

32 fields verified against source quotes.

agentic workflowai agentquality inspectioncode diff prfailure mode describedmetric backednamed customerproduction runtime claimedtools describedvendor confirmedworkflow describedsoftwareautomation rateemployee productivityerror reductiontechnical build writeupquality assuranceai draft human approvalmonitor detect alert