back_office_ops · workflow

Lyft's ML Feature Serving Infrastructure: Unified Batch and Streaming Feature Access for Training and Online Inference

Lyft's ML models require features computed via both batch jobs on the data warehouse and real-time event streams, and those features must be accessible in two modes: batch queries for model training and low-latency point lookups for online inference — a dual-access requirement that needed a unified infrastructure solution.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · SQL Feature Definition

ML practitioners define features using SQL, with one column designated as an entity ID and the remaining columns as features.

Tools used

FlyteFlinkDynamoDBHiveRedisElasticsearchKafka

Outcome

The Feature Service has been widely adopted across Lyft teams since Q4 2017, hosting thousands of features across many ML models, serving millions of requests per minute with single-digit millisecond latency and 99.99%+ availability.

Results

Time savedmillions of requests per minute

Volume99.99%+

Running sinceQ4 2017

Source

https://eng.lyft.com/ml-feature-serving-infrastructure-at-lyft-d30bf2d3c32a

How we source this →

Grounding & classification

Source type: technical build writeup

23 fields verified against source quotes.

fraud detectionpredictive analyticsmetric backednamed customerproduction runtime claimedtools describedworkflow describedlogisticssoftwarethroughput increasetechnical build writeupback office opsdata sync enrichment