customer_support · finance · workflow

GEICO experimental RAG implementation with RagRails hallucination guardrails for conversational quoting

LLMs used in GEICO's hackathon-winning conversational chat application produced unreliable and inconsistent outputs — including hallucinations and a specific pattern called 'overpromising' where the model incorrectly assumed capabilities it did not have — making the experience unsuitable for public-facing customer interactions.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Offline document vectorization

Business data is split into documents, converted to embeddings via an API, and metadata is extracted using LLMs before being indexed in the vector database.

Tools used

GPT modelHNSWvector database

Outcome

The RagRails strategy reduced overpromising errors from 12 out of 20 test responses to 6 and eventually to none after further adjustments, providing a repeatable mechanism for hallucination control in the RAG pipeline.

What failed first

The first RAG implementation placed entire records in the system prompt, producing ineffective and unreliable results; subsequent attempts to permanently fix overpromising by adding instructions directly to the system prompt also failed and disrupted other pipeline goals.

Results

Volume12 out of 20 responses were incorrect

Running since2023

Source

https://www.geico.com/techblog/application-of-retrieval-augmented-generation/

How we source this →

Grounding & classification

Source type: technical build writeup

21 fields verified against source quotes.

chatbotconversational airagknowledge basefailure mode describedmetric backednamed customertools describedworkflow describedinsuranceerror reductiontechnical build writeupcustomer supportsales opsrag answering