DoorDash deploys MLP-gated MoE deep learning model for 20% relative improvement in ETA prediction accuracy
DoorDash's tree-based ETA models had limited expressiveness — predictions showed less variance than ground truth — and struggled to capture intricate temporal and spatial patterns across a large, varied delivery network as operations scaled.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Order triggers ETA computation
An order creation event initiates delivery duration estimation.
Tools used
MLP-gated MoEDeepNetCrossNettransformer
Outcome
The new MLP-gated MoE architecture delivered a 20% relative improvement in ETA accuracy across large and small orders, long and short distances, and peak and off-peak hours, improving customer trust and operational efficiency.
What failed first
Initial co-training of multitask models caused significant accuracy degradation due to task interference. Enforcing cross-stage consistency via an adjustment to later-stage predictions lowered accuracy. Training Weibull distribution parameters with a log-likelihood loss function produced unreasonable outputs, including negative location parameter values.