Credal.ai: RAG on enterprise documents — metadata tagging, data restructuring, and LLM-based routing
LLMs have limited attention and struggle to provide high-quality answers on complex corporate documents — naive RAG fails to retrieve the most relevant sections, poorly structured data (footnotes detached from citations, opaquely formatted tables) makes questions unanswerable, and LLMs cannot consistently reason about dates.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · LLM metadata tag generation
LLMs calculate metadata tags for each document or section at ingestion time to enable more precise retrieval.
By restructuring data before LLM ingestion — using LLM-generated metadata tags for pre-filtering, inserting footnotes inline, reformatting tables as CSV, and using focused few-shot prompting for routing — Credal turned previously unanswerable questions into straightforward ones and improved routing beyond the 50% baseline.
What failed first
LangChain's naive RAG failed to surface the most relevant document sections; its document parser placed footnotes at the end rather than inline, making citation lookups impossible; its table formatting was token-inefficient; and its structured output formatter distracted GPT-3.5 from the routing task, leaving it accurate only 50% of the time.