GEICO experimental RAG implementation with RagRails hallucination guardrails for conversational quoting
LLMs used in GEICO's hackathon-winning conversational chat application produced unreliable and inconsistent outputs — including hallucinations and a specific pattern called 'overpromising' where the model incorrectly assumed capabilities it did not have — making the experience unsuitable for public-facing customer interactions.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Offline document vectorization
Business data is split into documents, converted to embeddings via an API, and metadata is extracted using LLMs before being indexed in the vector database.
Tools used
GPT modelHNSWvector database
Outcome
The RagRails strategy reduced overpromising errors from 12 out of 20 test responses to 6 and eventually to none after further adjustments, providing a repeatable mechanism for hallucination control in the RAG pipeline.
What failed first
The first RAG implementation placed entire records in the system prompt, producing ineffective and unreliable results; subsequent attempts to permanently fix overpromising by adding instructions directly to the system prompt also failed and disrupted other pipeline goals.