ecommerce_ops · workflow

DoorDash reduces ML model serving response time by 50% through gRPC optimization

DoorDash's gRPC-based ML model serving setup for search ranking had network overheads consuming up to 50% of service response time, creating large unexplained latency gaps at a scale of approximately one million predictions per second.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · ML use cases trigger prediction requests

Machine learning use cases such as search and recommendation, fraud detection, dasher dispatch optimization, and delivery time prediction trigger prediction requests.

Tools used

gRPCSibylKotlinKubernetesAWSPyTorchWavefrontZstandardNettyServerBuilderLinkerdIstiolzbench

Outcome

By enabling client-side load balancing, switching to zstd payload compression, and removing a slow logging call identified via transport-level tracing, DoorDash reduced overall response time by 50% and network overheads by 33%.

What failed first

A logging call placed outside the predictionsMade scope was creating approximately 50ms of overhead, identified only through transport-level gRPC tracing after the service was in production.

Results

Time saved50%

Volume33%

Source

https://careersatdoordash.com/blog/enabling-efficient-machine-learning-model-serving/

How we source this →

Grounding & classification

Source type: technical build writeup

35 fields verified against source quotes.

fraud detectionpredictive analyticsrecommendation systemfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedecommercelogisticscycle time reductionthroughput increasetechnical build writeupback office opsecommerce ops