Evolution and Scale of Uber Eats' Multilingual Semantic Search Platform
Uber Eats' lexical search stack could not handle real-world query complexity—synonyms, typos, shorthand, multilingual terms, and context-dependent words—causing missed intent and poor results for a large portion of user searches.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · User types search query
A large share of orders start with people typing into the search bar to find stores, dishes, and grocery items.
Tools used
QwenPyTorchHugging Face TransformersRayDeepSpeedHNSWfeature store
Outcome
Uber Eats built a production semantic search system that powers multilingual discovery across restaurants, grocery, and retail, achieving a 34% latency reduction and 17% CPU savings through k-tuning, more than halving latency with scalar quantization while maintaining recall above 0.95, and reducing storage costs by nearly 50% with MRL embeddings.
What failed first
Traditional lexical matching was effective only when queries exactly matched document text, but produced bad search results for the broad range of real-world queries Uber Eats receives.