DoorDash builds out-of-the-box ML model observability platform to detect and prevent model drift
DoorDash's ML models degraded over time after deployment — a process called model drift — negatively impacting accuracy of time estimates and other model outputs. The team had no systematic monitoring capability, so when models made incorrect predictions, diagnosing the cause took a long time and forced the engineering team to spend significant effort on reactive investigation.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Sibyl logs predictions
Sibyl, DoorDash's prediction service, logs every prediction including prediction result, feature values, and prediction ID to the data warehouse.
Tools used
SibylPrometheusGrafanaPromQLApache SparkYAML
Outcome
DoorDash shipped a scalable, out-of-the-box ML monitoring platform that onboarded multiple teams including Logistics, Fraud, Supply and Demand, and ETA, enabling self-serve alerting and freeing data scientists to focus on model development rather than systems design.
What failed first
Storing prediction logs in a data warehouse supported ad-hoc deep dives but provided no big-picture visibility into why models were drifting, leaving the team without a proactive way to detect or diagnose drift.