Netflix builds a human-augmenting agentic workflow for observational causal inference
Observational causal inference (OCI) requires substantial judgment and domain expertise, but repetitive aspects like rechecking covariate balance, conducting sensitivity analyses, and tracking multiple iterations are error-prone — and LLMs given unscaffolded analysis plans produce biased estimates, as demonstrated by early adopter bias inflating the Netflix case study baseline.
The scaffolded agentic workflow recovered ground truth in nine out of ten ACIC benchmark datasets; the critic agent separated reliable estimates (192 satisfactory, lower RMSE, better-calibrated confidence intervals) from unreliable ones (39 unsatisfactory), and the workflow reduced human toil on iterative causal analyses at Netflix.
One-shot LLM prompting without scaffolding produced consistently wrong answers on benchmark datasets; in the Netflix case study the paved-path agentic workflow produced an updated estimate that was just 25% of the baseline, revealing that the unscaffolded approach was heavily distorted by early adopter bias and poor overlap.
https://netflixtechblog.com/a-human-augmenting-agentic-workflow-for-causal-inference-4623f0a9c5af