back_office_ops · workflow

LyftLearn: ML model training and batch prediction infrastructure built on Kubernetes at Lyft

Lyft needed a unified platform to simplify ML model development, parallelize training, track runs, retrain on schedule, and deploy models across many teams using diverse modeling libraries and techniques.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Select hardware and base image
The user goes to the LyftLearn homepage to select hardware configuration and a base image before starting development.
Tools used
KubernetesJupyterR-studioFlyteSparkFuguesklearnLightGBMXGBoostPyTorchTensorFlowAWS Elastic File SystemHivePrestoAWS RDS AuroraAWS Elastic Container RegistryDocker
Outcome

LyftLearn achieved wide adoption across dozens of teams building hundreds of models every week, with Kubernetes-based environment spin-up in seconds enabling the fast iteration critical to ML development.

Results
Time savedhundreds of models every week
Source

https://eng.lyft.com/lyftlearn-ml-model-training-infrastructure-built-on-kubernetes-aef8218842bb

How we source this →

Grounding & classification
Source type: technical build writeup
33 fields verified against source quotes.
fraud detectionpredictive analyticsnamed customerproduction runtime claimedsource backedtools describedworkflow describedlogisticstravelemployee productivitythroughput increasetechnical build writeupback office opsdata sync enrichment