quality_assurance · saas · workflow
Mozilla hardens Firefox by fixing 271 latent security bugs with Claude Mythos Preview
Firefox contained latent security bugs that were notoriously difficult to find with traditional fuzzing, particularly sandbox escapes in the multiprocess browser engine that required complex reasoning to discover.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Prompt harness to find bug
The harness is prompted with the instruction to find a bug in a specific part of the code and build a test case.
Tools used
Claude Mythos PreviewClaude Opus 4.6GPT 4Sonnet 3.5AddressSanitizer
Outcome
Mozilla identified and fixed 271 previously-unknown vulnerabilities using Claude Mythos Preview in Firefox 150, including 180 sec-high and 80 sec-moderate bugs, with 423 total security bugs fixed in April releases.
What failed first
Early LLM code audit experiments using GPT 4 and Sonnet 3.5 for static analysis of high-risk code showed some promise but produced a high rate of false positives that made scaling impractical, and AI-generated security reports to open source projects broadly were regarded as unwanted noise.
Results
Volume423
Cost replaced271
Running sinceFebruary
Grounding & classification
Source type: technical build writeup
30 fields verified against source quotes.
agentic workflowai agentcode generationcode diff prfailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwareerror reductionthroughput increasetechnical build writeupquality assuranceagentic task executionmonitor detect alert