back_office_ops · workflow
Building real-time machine learning foundations at Lyft with RealtimeMLPipeline
Streaming data was not a first-class citizen in Lyft's LyftLearn ML platform, forcing teams to spend weeks or months of engineering effort to integrate it into ML workflows despite strong developer appetite for real-time ML systems.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Developer defines pipeline
A developer provides metadata such as a feature name, version, and a query, then instantiates a RealtimeMLPipeline Python object.
Tools used
FlinkPyFlinkKafkaKinesisS3KubernetesGitHubJupyterLyftLearnHive
Outcome
Lyft reduced the time to launch a new real-time ML application from multiple weeks to a few days, achieved self-service adoption across nearly all engineering pillars (Rider, Driver, Marketplace, Mapping, Safety), and enabled teams to build higher-order abstractions including a Real-time Anomaly Detection product.
Results
Time savedfrom many weeks to days
Volumeweeks or months of engineering effort
Grounding & classification
Source type: technical build writeup
28 fields verified against source quotes.
anomaly detectionforecastingpredictive analyticsfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedtravelcycle time reductionemployee productivitytechnical build writeupback office opsdata sync enrichmentmonitor detect alert