customer_support · healthcare · workflow

Doctolib improves customer care case deflection by 20% with a RAG system built on GPT-4o and OpenSearch

Doctolib's customer care relied on rules-based bots that could not adapt to users or their context. Standard LLMs lacked access to private and recent data, limiting accuracy. A RAG system was pursued to address this gap but initially suffered from insufficient results and latency exceeding one minute.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User query submitted
The RAG pipeline begins when a user query is embedded.
Tools used
GPT-4oAzure OpenAI serviceOpenSearchRagas
Outcome

The improved RAG system achieved a 20% reduction in customer care cases reaching agents and reduced response latency from one minute to less than five seconds, enabling customer care agents to focus on more complex cases.

What failed first

The initial vanilla RAG implementation did not produce good enough results on its own, and response latency reached up to one minute, making the system impractical for real users.

Results
Time savedfrom a latency of 1 min to less than 5s
Volume20%
Source

https://medium.com/doctolib/part-1-from-retrieval-augmented-generation-rag-to-agents-doctolibs-journey-to-revolutionize-d34610eeb550

How we source this →

Grounding & classification
Source type: technical build writeup
22 fields verified against source quotes.
predictive analyticsragknowledge basefailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedhealthcarecycle time reductiondeflection rateemployee productivitytechnical build writeupcustomer supportautonomous resolutionrag answering