Flyte: Lyft's ML orchestration platform powers 1M+ pipelines and joins LF AI & Data
Lyft needed to orchestrate complex ML workflows for its ETA product, requiring management of large historical training datasets, complex output artifacts, backtesting, frequent retraining, and simultaneous multi-model deployment — with the largest bottleneck being infrastructure procurement and management for models that might not work out.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Historical data training trigger
Large amounts of historical data must be used to train a set of ensemble models.
Tools used
FlyteAWS Step Functionsflytekit
Outcome
By mid 2020 Flyte was powering more than 1 million pipelines at Lyft across ETA, Pricing, Mapping, Driver Engagement, Growth, and Map generation teams, and was contributed to the Linux Foundation AI & Data as its 25th hosted project.
What failed first
The initial v1 of Flyte used AWS Step Functions as its scheduler, which proved too rigid to extend with new features natively, leading the team to build a container-native scheduling engine.
Results
Volumemore than 1 million
Cost replacedtotal cost of running Flyte at Lyft was low