quality_assurance · saas · workflow
Effective harnesses for long-running agents: Anthropic's two-agent solution for multi-context-window software development
AI coding agents running across multiple context windows have no memory of prior sessions, causing them to either attempt to build everything at once and run out of context mid-implementation, or prematurely declare a project complete after seeing partial progress.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · High-level prompt triggers session
A user provides a high-level prompt to kick off the long-running agent task.
Tools used
Claude Agent SDKOpus 4.5Puppeteer MCPgit
Outcome
The two-agent harness dramatically improved performance by eliminating wasted re-orientation time, enabling incremental feature-by-feature progress with proper end-to-end verification and clean session handoffs.
What failed first
Without a structured harness, Claude agents running in a loop either over-reached by one-shotting the full app (leaving code half-implemented with no documentation) or stopped too early by falsely declaring completion. Context compaction alone was insufficient.
Results
Cost replacedsaves Claude some tokens in every session
Grounding & classification
Source type: technical build writeup
23 fields verified against source quotes.
agentic workflowcode generationmulti agent workflowcode diff prknowledge basefailure mode describedmetric backedsource backedtools describedworkflow describedsoftwareemployee productivityerror reductiontime savedtechnical build writeupquality assuranceagentic task execution