quality_assurance · finance · workflow
Coinbase's QA AI agent detects 300% more bugs at 86% lower cost than manual testing
Coinbase's manual QA process was slow and expensive, and traditional end-to-end integration tests were flaky — minor layout changes caused failures requiring hours of debugging, with no scalable path to expanding coverage.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Natural language test request
A test is initiated with a natural language prompt describing the scenario to execute.
Tools used
browser-useMongoDBBrowserStackgRPCWebSocketLLM
Outcome
The qa-ai-agent detects 300% more bugs in the same timeframe at 86% lower cost than manual testing, currently runs 40 test scenarios, identifies 10 issues weekly, and is on track to eventually replace at least 75% of current manual testing.
What failed first
Traditional end-to-end integration tests were prone to flakiness; minor layout changes caused test failures that required hours of debugging.
Results
Time saveda week to complete, now can be done in minutes
Volume75% (AI) vs. 80% (Manual)
Cost replaced86%
Source
https://www.coinbase.com/en-it/blog/How-We-are-Improving-Product-Quality-at-Coinbase-with-AI-agents
Grounding & classification
Source type: technical build writeup
37 fields verified against source quotes.
agentic workflowai agentcomputer visionquality inspectionhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedfinancial servicessoftwareautomation ratecost reductioncycle time reductionemployee productivitythroughput increasetechnical build writeupquality assuranceagentic task executionautonomous resolution