back_office_ops · workflow

Building real-time machine learning foundations at Lyft with RealtimeMLPipeline

Streaming data was not a first-class citizen in Lyft's LyftLearn ML platform, forcing teams to spend weeks or months of engineering effort to integrate it into ML workflows despite strong developer appetite for real-time ML systems.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Developer defines pipeline

A developer provides metadata such as a feature name, version, and a query, then instantiates a RealtimeMLPipeline Python object.

Tools used

FlinkPyFlinkKafkaKinesisS3KubernetesGitHubJupyterLyftLearnHive

Outcome

Lyft reduced the time to launch a new real-time ML application from multiple weeks to a few days, achieved self-service adoption across nearly all engineering pillars (Rider, Driver, Marketplace, Mapping, Safety), and enabled teams to build higher-order abstractions including a Real-time Anomaly Detection product.

Results

Time savedfrom many weeks to days

Volumeweeks or months of engineering effort

Source

https://eng.lyft.com/building-real-time-machine-learning-foundations-at-lyft-6dd99b385a4e

How we source this →

Grounding & classification

Source type: technical build writeup

28 fields verified against source quotes.

anomaly detectionforecastingpredictive analyticsfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedtravelcycle time reductionemployee productivitytechnical build writeupback office opsdata sync enrichmentmonitor detect alert