Lyft's ML models make millions of high-stakes decisions per day, but model performance degrades gradually and is hard to detect. Before a centralized solution, ML engineers built one-off monitoring per model, resulting in duplicated work and no centralized visibility across hundreds of models.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Model scores emitted to metrics
For every model scoring request made to LyftLearn Serving, the system emits the model output to the metrics system.
Over 90% of Lyft's production models have Feature Validation and Model Score Monitoring, and 75% have Performance Drift Detection or Anomaly Detection. The system fired hundreds of alarms and caught over 15 high-impact issues in the nine months following general availability.
What failed first
When ETA models were retrained due to COVID-related demand drops, downstream pricing models that used ETAs as inputs dramatically under-predicted, revealing an unanticipated cascading dependency between models.