legal_document_review · services · workflow

Thomson Reuters CoCounsel: Evaluating Long Context LLM Performance for Legal AI

Legal documents routinely exceed hundreds of pages, and simply fitting text into a large LLM context window does not guarantee effective performance — the more text included, the higher the risk of missing critical details, and effective context windows are often much smaller than advertised limits.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User uploads legal document
Users upload documents and CoCounsel automatically performs various tasks on them.
Tools used
CoCounselGPT-4GPT-4.1o1-miniWestlawReuters NewsNovelQARAG
Outcome

CoCounsel 2.0 leverages long context LLMs to the greatest extent possible, backed by a multi-stage evaluation pipeline using over 20,000 benchmark test samples and final manual review by attorney SMEs, saving lawyers valuable time on document-centric legal tasks.

What failed first

Early GPT-4 had only an 8K token context window requiring document chunking, and RAG underperforms for complex legal queries that require comparing information across an entire document because semantic retrieval returns only passages explicitly matching the query.

Results
Time savedsaving lawyers valuable time
Volumeover 20,000
Source

https://www.thomsonreuters.com/en-us/posts/innovation/legal-ai-benchmarking-evaluating-long-context-performance-for-llms/

How we source this →

Grounding & classification
Source type: technical build writeup
31 fields verified against source quotes, 1 dropped as unverifiable.
agentic workflowdocument airagsummarizationcontractfailure mode describedhuman review describednamed customerproduction runtime claimedsource backedtools describedworkflow describedlegalaccuracy improvementtime savedtechnical build writeuplegal document reviewlegal opsquality assurancedocument to recordhuman review queue