quality_assurance · finance · workflow

Coinbase's QA AI agent detects 300% more bugs at 86% lower cost than manual testing

Coinbase's manual QA process was slow and expensive, and traditional end-to-end integration tests were flaky — minor layout changes caused failures requiring hours of debugging, with no scalable path to expanding coverage.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Natural language test request

A test is initiated with a natural language prompt describing the scenario to execute.

Tools used

browser-useMongoDBBrowserStackgRPCWebSocketLLM

Outcome

The qa-ai-agent detects 300% more bugs in the same timeframe at 86% lower cost than manual testing, currently runs 40 test scenarios, identifies 10 issues weekly, and is on track to eventually replace at least 75% of current manual testing.

What failed first

Traditional end-to-end integration tests were prone to flakiness; minor layout changes caused test failures that required hours of debugging.

Results

Time saveda week to complete, now can be done in minutes

Volume75% (AI) vs. 80% (Manual)

Cost replaced86%

Source

https://www.coinbase.com/en-it/blog/How-We-are-Improving-Product-Quality-at-Coinbase-with-AI-agents

How we source this →

Grounding & classification

Source type: technical build writeup

37 fields verified against source quotes.

agentic workflowai agentcomputer visionquality inspectionhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedfinancial servicessoftwareautomation ratecost reductioncycle time reductionemployee productivitythroughput increasetechnical build writeupquality assuranceagentic task executionautonomous resolution