back_office_ops · logistics · workflow

QueryGPT: Uber builds a natural language to SQL system using LLMs and multi-agent architecture

Authoring SQL queries at Uber required deep knowledge of SQL syntax and internal data models and took around 10 minutes per query, creating a productivity bottleneck across approximately 1.2 million interactive queries per month.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User submits natural language prompt
A user submits a natural language question to QueryGPT to generate a SQL query.
Tools used
large language modelsvector databasesRAGOpenAI GPT-4 Turbo
Outcome

QueryGPT reduced SQL query authoring time from around 10 minutes to about 3 minutes, reached about 300 daily active users in limited release, and 78% of users reported the generated queries reduced the time they would have spent writing from scratch.

What failed first

The initial version of QueryGPT used simple RAG over a small sample set and suffered declining accuracy as more tables were onboarded; simple similarity search between natural language prompts and SQL schemas returned irrelevant results, and large schemas exceeded the available LLM token limit.

Results
Time savedapproximately 1.2 million
Volume36%
Running sinceMay 2023
Source

https://www.uber.com/en-JP/blog/query-gpt/?uclick_id=eaf82e80-940f-4baf-87d6-76c4fbd37f1a

How we source this →

Grounding & classification
Source type: technical build writeup
33 fields verified against source quotes.
agentic workflowcode generationmulti agent workflowragknowledge basefailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedsource backedtools describedworkflow describedlogisticssoftwarecycle time reductionemployee productivitytime savedtechnical build writeupback office opsagentic task executionhuman review queuerag answering