back_office_ops · workflow

Netflix builds a human-augmenting agentic workflow for observational causal inference

Observational causal inference (OCI) requires substantial judgment and domain expertise, but repetitive aspects like rechecking covariate balance, conducting sensitivity analyses, and tracking multiple iterations are error-prone — and LLMs given unscaffolded analysis plans produce biased estimates, as demonstrated by early adopter bias inflating the Netflix case study baseline.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Principal submits analysis plan
A human principal provides an initial analysis plan specifying context, goals, confounders, permitted tools, and the dataset.
Tools used
Claude Sonnet 4.6oci-agentEconML
Outcome

The scaffolded agentic workflow recovered ground truth in nine out of ten ACIC benchmark datasets; the critic agent separated reliable estimates (192 satisfactory, lower RMSE, better-calibrated confidence intervals) from unreliable ones (39 unsatisfactory), and the workflow reduced human toil on iterative causal analyses at Netflix.

What failed first

One-shot LLM prompting without scaffolding produced consistently wrong answers on benchmark datasets; in the Netflix case study the paved-path agentic workflow produced an updated estimate that was just 25% of the baseline, revealing that the unscaffolded approach was heavily distorted by early adopter bias and poor overlap.

Results
Volume25% of the baseline
Source

https://netflixtechblog.com/a-human-augmenting-agentic-workflow-for-causal-inference-4623f0a9c5af

How we source this →

Grounding & classification
Source type: technical build writeup
24 fields verified against source quotes.
agentic workflowai agentmulti agent workflowhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedmediaaccuracy improvementemployee productivitytechnical build writeupback office opsai draft human approvalhuman review queue