back_office_ops · workflow

LyftLearn: ML model training and batch prediction infrastructure built on Kubernetes at Lyft

Lyft needed a unified platform to simplify ML model development, parallelize training, track runs, retrain on schedule, and deploy models across many teams using diverse modeling libraries and techniques.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Select hardware and base image

The user goes to the LyftLearn homepage to select hardware configuration and a base image before starting development.

Tools used

KubernetesJupyterR-studioFlyteSparkFuguesklearnLightGBMXGBoostPyTorchTensorFlowAWS Elastic File SystemHivePrestoAWS RDS AuroraAWS Elastic Container RegistryDocker

Outcome

LyftLearn achieved wide adoption across dozens of teams building hundreds of models every week, with Kubernetes-based environment spin-up in seconds enabling the fast iteration critical to ML development.

Results

Time savedhundreds of models every week

Source

https://eng.lyft.com/lyftlearn-ml-model-training-infrastructure-built-on-kubernetes-aef8218842bb

How we source this →

Grounding & classification

Source type: technical build writeup

33 fields verified against source quotes.

fraud detectionpredictive analyticsnamed customerproduction runtime claimedsource backedtools describedworkflow describedlogisticstravelemployee productivitythroughput increasetechnical build writeupback office opsdata sync enrichment