quality_assurance · saas · workflow

GitHub Copilot agent-driven development enables Copilot Applied Science team to ship 11 agents in under three days

Analyzing coding agent trajectories across standardized benchmark runs required reading hundreds of thousands of lines of JSON code per day—an impossible task to do manually that forced the researcher into a repetitive loop of using Copilot to surface patterns before investigating them.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Benchmark trajectory analysis trigger

Benchmark runs against standardized evaluation datasets produce trajectory files containing agent thought processes and actions that need analysis.

Tools used

GitHub CopilotCopilot CLIClaude Opus 4.6VSCodeCopilot SDKMCP serversCopilot Code Review

Outcome

The eval-agents tool and agent-driven development methodology enabled five contributors to ship 11 new agents, four new skills, and a new concept in under three days—a change of +28,858/-2,884 lines of code across 345 files—while also reducing the lines of trajectory code the researcher had to read from hundreds of thousands to a few hundred.

Results

Time savedless than three days

Volume11

Source

https://github.blog/ai-and-ml/github-copilot/agent-driven-development-in-copilot-applied-science/

How we source this →

Grounding & classification

Source type: technical build writeup

33 fields verified against source quotes.

agentic workflowai agentcode generationcode diff prbuilder submittedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwareemployee productivitythroughput increasetime savedtechnical build writeupback office opsquality assuranceagentic task executionai draft human approval