back_office_ops · workflow

DoorDash builds a gigascale ML feature store with Redis hashes, xxHash, and Snappy compression to triple cluster capacity

DoorDash's existing Redis-based feature store had significant inefficiencies and was approaching capacity limits while needing to serve billions of feature records with millions of lookups per second for ML model inference under low-latency constraints.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Benchmark 5 key-value stores
DoorDash ran a full benchmark evaluation on five different key-value stores using YCSB to compare cost and performance metrics.
Tools used
RedisAWS ElastiCacheYCSBxxHashSnappyDocker
Outcome

After implementing Redis hashes, xxHash string hashing, and Snappy compression, DoorDash reduced production cluster memory from 298 GB to 112 GB per billion features, cut CPU from 208 to 72 vCPUs per 10 million reads per second, and improved Redis read latency by 40% and overall feature store latency by 15%.

What failed first

The existing Redis feature store stored features as a flat list of key-value pairs, which was memory-inefficient and compute-intensive, and the production cluster was running close to its capacity limits.

Results
Volume38%
Cost replacedtripling our cost reduction
Source

https://careersatdoordash.com/blog/building-a-gigascale-ml-feature-store-with-redis/

How we source this →

Grounding & classification
Source type: technical build writeup
29 fields verified against source quotes, 2 dropped as unverifiable.
predictive analyticsrecommendation systemproduct catalogmetric backednamed customerproduction runtime claimedtools describedworkflow describedecommercecost reductioncycle time reductionthroughput increasetechnical build writeupback office opsdata sync enrichment