back_office_ops · workflow
From Facts & Metrics to Media Machine Learning: Evolving the Data Engineering Function at Netflix
Netflix's traditional data engineering focused on structured tables for metrics, dashboards, and statistical modeling, but as studio and content production scaled, media data — multi-modal, unstructured, and massive — required a fundamentally different approach that existing pipelines could not provide.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Media asset ingestion from AMP
All data for the initial phase comes from AMP, Netflix's internal asset management system and annotation store.
Tools used
LanceDBAMP
Outcome
Netflix formalized a new Media Data Engineering specialization and built the Media Data Lake to provide centralized, standardized, scalable access to media assets and ML-derived features, enabling richer ML models, faster experimentation, and new AI-powered features.
Grounding & classification
Source type: technical build writeup
19 fields verified against source quotes, 1 dropped as unverifiable.
computer visiondata extractionenterprise searchspeech to textknowledge basebuilder submittednamed customertools describedworkflow describedmediaaccuracy improvementemployee productivitytechnical build writeupback office opsdata sync enrichmentdocument to record