quality_assurance · workflow

Full-Spectrum ML Model Monitoring at Lyft

Lyft's ML models make millions of high-stakes decisions per day, but model performance degrades gradually and is hard to detect. Before a centralized solution, ML engineers built one-off monitoring per model, resulting in duplicated work and no centralized visibility across hundreds of models.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Model scores emitted to metrics
For every model scoring request made to LyftLearn Serving, the system emits the model output to the metrics system.
Tools used
LyftLearn ServingGreat ExpectationsSparkFugueKubernetesVerity
Outcome

Over 90% of Lyft's production models have Feature Validation and Model Score Monitoring, and 75% have Performance Drift Detection or Anomaly Detection. The system fired hundreds of alarms and caught over 15 high-impact issues in the nine months following general availability.

What failed first

When ETA models were retrained due to COVID-related demand drops, downstream pricing models that used ETAs as inputs dramatically under-predicted, revealing an unanticipated cascading dependency between models.

Results
Time savedover 15
Volumeover 90%
Running sinceearly 2020
Source

https://eng.lyft.com/full-spectrum-ml-model-monitoring-at-lyft-a4cdaf828e8f

How we source this →

Grounding & classification
Source type: technical build writeup
24 fields verified against source quotes.
anomaly detectionfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedlogisticsautomation rateerror reductiontechnical build writeupquality assurancemonitor detect alert