back_office_ops · workflow

Netflix's Axion ML Fact Store eliminates training-serving skew and reduces offline feature regeneration from weeks to hours

Netflix's ML models train on weeks of historical data, so testing updated feature encoders required waiting weeks for feature logging to accumulate sufficient data — making experimentation slow and creating training-serving skew risk.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Production inference runs

Compute applications fetch member and video facts from gRPC services, run shared feature encoders, and score ML models to generate personalized recommendations.

Tools used

AxionKeystoneIcebergEVCacheSparkParquetprotobufgrpc

Outcome

Axion reduces offline feature regeneration from weeks to hours, EVCache queries run 3x–50x faster than Iceberg, and data quality monitoring detects more than 95% of data issues early — making Axion the de facto fact store for Netflix's Personalization ML models.

What failed first

Feature logging required weeks of waiting for data. ETL with normalized multi-table storage caused Spark shuffle issues at scale. Even a single denormalized Iceberg table was too slow for queries filtering hundreds of millions of rows to under a million, and bloom filters plus predicate pushdown were insufficient.

Results

Time savedhours compared to weeks

Volume3x-50x faster

Source

https://netflixtechblog.com/evolution-of-ml-fact-store-5941d3231762

How we source this →

Grounding & classification

Source type: technical build writeup

26 fields verified against source quotes.

personalizationrecommendation systembuilder submittedfailure mode describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedmediaaccuracy improvementcycle time reductionemployee productivitytechnical build writeupback office opsdata sync enrichment