DoorDash builds a gigascale ML feature store with Redis hashes, xxHash, and Snappy compression to triple cluster capacity
DoorDash's existing Redis-based feature store had significant inefficiencies and was approaching capacity limits while needing to serve billions of feature records with millions of lookups per second for ML model inference under low-latency constraints.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Benchmark 5 key-value stores
DoorDash ran a full benchmark evaluation on five different key-value stores using YCSB to compare cost and performance metrics.
Tools used
RedisAWS ElastiCacheYCSBxxHashSnappyDocker
Outcome
After implementing Redis hashes, xxHash string hashing, and Snappy compression, DoorDash reduced production cluster memory from 298 GB to 112 GB per billion features, cut CPU from 208 to 72 vCPUs per 10 million reads per second, and improved Redis read latency by 40% and overall feature store latency by 15%.
What failed first
The existing Redis feature store stored features as a flat list of key-value pairs, which was memory-inefficient and compute-intensive, and the production cluster was running close to its capacity limits.