quality_assurance · saas · workflow

Code Droid: Factory.ai autonomous agent achieves 19.27% on SWE-bench Full

Software engineering teams face productivity bottlenecks from rote, tedious programming tasks that consume capacity and slow engineering velocity at scale.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Ticket assigned to Code Droid

Code Droid is set up to autonomously complete tickets assigned to it.

Tools used

Code DroidHyperCodeByteRankCrucibleAnthropicOpenAI

Outcome

Code Droid achieved 19.27% on SWE-bench Full and 31.67% on SWE-bench Lite (pass@1), improving to 42.67% at pass@6, while outperforming Devin and AiderGPT4o on a comparative subset.

Results

Time saved5 to 20 minutes

Volume19.27%

Source

https://www.factory.ai/news/code-droid-technical-report

How we source this →

Grounding & classification

Source type: technical build writeup

40 fields verified against source quotes, 1 dropped as unverifiable.

agentic workflowai agentcode generationragcode diff prknowledge basefailure mode describedmetric backedproduction runtime claimedtools describedvendor confirmedworkflow describedsoftwareaccuracy improvementemployee productivitytechnical build writeupback office opsquality assuranceagentic task executionautonomous resolution