back_office_ops · workflow

From Facts & Metrics to Media Machine Learning: Evolving the Data Engineering Function at Netflix

Netflix's traditional data engineering focused on structured tables for metrics, dashboards, and statistical modeling, but as studio and content production scaled, media data — multi-modal, unstructured, and massive — required a fundamentally different approach that existing pipelines could not provide.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Media asset ingestion from AMP

All data for the initial phase comes from AMP, Netflix's internal asset management system and annotation store.

Tools used

LanceDBAMP

Outcome

Netflix formalized a new Media Data Engineering specialization and built the Media Data Lake to provide centralized, standardized, scalable access to media assets and ML-derived features, enabling richer ML models, faster experimentation, and new AI-powered features.

Source

https://netflixtechblog.com/from-facts-metrics-to-media-machine-learning-evolving-the-data-engineering-function-at-netflix-6dcc91058d8d

How we source this →

Grounding & classification

Source type: technical build writeup

19 fields verified against source quotes, 1 dropped as unverifiable.

computer visiondata extractionenterprise searchspeech to textknowledge basebuilder submittednamed customertools describedworkflow describedmediaaccuracy improvementemployee productivitytechnical build writeupback office opsdata sync enrichmentdocument to record