legal_document_review · services · workflow

Thomson Reuters CoCounsel: Evaluating Long Context LLM Performance for Legal AI

Legal documents routinely exceed hundreds of pages, and simply fitting text into a large LLM context window does not guarantee effective performance — the more text included, the higher the risk of missing critical details, and effective context windows are often much smaller than advertised limits.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · User uploads legal document

Users upload documents and CoCounsel automatically performs various tasks on them.

Tools used

CoCounselGPT-4GPT-4.1o1-miniWestlawReuters NewsNovelQARAG

Outcome

CoCounsel 2.0 leverages long context LLMs to the greatest extent possible, backed by a multi-stage evaluation pipeline using over 20,000 benchmark test samples and final manual review by attorney SMEs, saving lawyers valuable time on document-centric legal tasks.

What failed first

Early GPT-4 had only an 8K token context window requiring document chunking, and RAG underperforms for complex legal queries that require comparing information across an entire document because semantic retrieval returns only passages explicitly matching the query.

Results

Time savedsaving lawyers valuable time

Volumeover 20,000

Source

https://www.thomsonreuters.com/en-us/posts/innovation/legal-ai-benchmarking-evaluating-long-context-performance-for-llms/

How we source this →

Grounding & classification

Source type: technical build writeup

31 fields verified against source quotes, 1 dropped as unverifiable.

agentic workflowdocument airagsummarizationcontractfailure mode describedhuman review describednamed customerproduction runtime claimedsource backedtools describedworkflow describedlegalaccuracy improvementtime savedtechnical build writeuplegal document reviewlegal opsquality assurancedocument to recordhuman review queue