Factory's anchored iterative summarization outperforms OpenAI and Anthropic context compression strategies for long-running AI agent sessions
Long-running AI agent sessions generate millions of tokens that exceed any model's working memory, and naive aggressive compression causes agents to forget critical details—file paths, error messages, past decisions—leading to wasted tokens re-reading files and re-exploring dead ends.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Context limit reached
When a long-running agent session generates millions of tokens exceeding the model's context window, compression is triggered.
Tools used
GPT-5.2Claude SDK/responses/compact
Outcome
Factory's structured summarization scores 0.35 points higher than OpenAI and 0.26 higher than Anthropic overall, with accuracy showing the largest gap (Factory 4.04), while maintaining comparable compression efficiency (98.6% vs OpenAI's 99.3%).
What failed first
Generic summarization treats all content as equally compressible, silently dropping file paths and decisions; traditional metrics like ROUGE or embedding similarity failed to capture whether an agent can actually continue working after compression.