Vinted migrates item search from Elasticsearch to Vespa, cutting servers in half and improving search latency 2.5x
As Vinted's catalogue grew to around 1 billion active items, Elasticsearch hit its limits: shard and replica management became time-consuming and error-prone, hot nodes created load imbalances, and a 300-second refresh interval made updated listings slow to surface in search.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Real-time item indexing
Apache Flink feeds item data into Vespa in real-time, ensuring new items are searchable within seconds.
Vinted halved its server count (to 60 nodes), improved search latency by 2.5x and indexing latency by 3x, cut change-visibility time from 300 seconds to 5 seconds, and increased ranking depth more than 3x to 200,000 candidate items — all under a single unified Vespa deployment with a significant business impact on search relevance.
What failed first
The previous Elasticsearch setup required constant shard and replica tuning, generated persistent hot node load imbalances, imposed a 300-second refresh interval, and became operationally unwieldy as data and traffic scaled.