Thomson Reuters CoCounsel: Evaluating Long Context LLM Performance for Legal AI
Legal documents routinely exceed hundreds of pages, and simply fitting text into a large LLM context window does not guarantee effective performance — the more text included, the higher the risk of missing critical details, and effective context windows are often much smaller than advertised limits.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User uploads legal document
Users upload documents and CoCounsel automatically performs various tasks on them.
CoCounsel 2.0 leverages long context LLMs to the greatest extent possible, backed by a multi-stage evaluation pipeline using over 20,000 benchmark test samples and final manual review by attorney SMEs, saving lawyers valuable time on document-centric legal tasks.
What failed first
Early GPT-4 had only an 8K token context window requiring document chunking, and RAG underperforms for complex legal queries that require comparing information across an entire document because semantic retrieval returns only passages explicitly matching the query.