quality_assurance · saas · workflow

Code Droid: Factory.ai autonomous agent achieves 19.27% on SWE-bench Full

Software engineering teams face productivity bottlenecks from rote, tedious programming tasks that consume capacity and slow engineering velocity at scale.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Ticket assigned to Code Droid
Code Droid is set up to autonomously complete tickets assigned to it.
Tools used
Code DroidHyperCodeByteRankCrucibleAnthropicOpenAI
Outcome

Code Droid achieved 19.27% on SWE-bench Full and 31.67% on SWE-bench Lite (pass@1), improving to 42.67% at pass@6, while outperforming Devin and AiderGPT4o on a comparative subset.

Results
Time saved5 to 20 minutes
Volume19.27%
Source

https://www.factory.ai/news/code-droid-technical-report

How we source this →

Grounding & classification
Source type: technical build writeup
40 fields verified against source quotes, 1 dropped as unverifiable.
agentic workflowai agentcode generationragcode diff prknowledge basefailure mode describedmetric backedproduction runtime claimedtools describedvendor confirmedworkflow describedsoftwareaccuracy improvementemployee productivitytechnical build writeupback office opsquality assuranceagentic task executionautonomous resolution