quality_assurance · saas · workflow

Effective harnesses for long-running agents: Anthropic's two-agent solution for multi-context-window software development

AI coding agents running across multiple context windows have no memory of prior sessions, causing them to either attempt to build everything at once and run out of context mid-implementation, or prematurely declare a project complete after seeing partial progress.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · High-level prompt triggers session

A user provides a high-level prompt to kick off the long-running agent task.

Tools used

Claude Agent SDKOpus 4.5Puppeteer MCPgit

Outcome

The two-agent harness dramatically improved performance by eliminating wasted re-orientation time, enabling incremental feature-by-feature progress with proper end-to-end verification and clean session handoffs.

What failed first

Without a structured harness, Claude agents running in a loop either over-reached by one-shotting the full app (leaving code half-implemented with no documentation) or stopped too early by falsely declaring completion. Context compaction alone was insufficient.

Results

Cost replacedsaves Claude some tokens in every session

Source

https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents

How we source this →

Grounding & classification

Source type: technical build writeup

23 fields verified against source quotes.

agentic workflowcode generationmulti agent workflowcode diff prknowledge basefailure mode describedmetric backedsource backedtools describedworkflow describedsoftwareemployee productivityerror reductiontime savedtechnical build writeupquality assuranceagentic task execution