Netflix researches Vera and VOID: controllable AI video editing for promotional assets
Creating polished promotional video assets from raw footage requires complex edits — adding visual elements, replacing backgrounds, removing objects — that demand hours of specialized manual work. Existing AI video editing tools regenerate entire clips when editing, inadvertently altering elements that should remain unchanged and failing to account for physical scene continuity when objects are removed.
How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Artist issues text edit instruction
An artist specifies what needs to change in a promotional video asset via a source video and a text editing instruction.
Tools used
VeraVOIDVLMCogVideoX-Fun-V1.5–5b-InPKubric
Outcome
Vera significantly outperforms existing baselines on content preservation in both automated metrics and human evaluations by 19 creative reviewers, and VOID was selected 64.8% of the time as best reflecting realistic scene evolution — though both remain in early research stages and not yet production-ready.
What failed first
Existing video editing models exhibit two documented failure modes: regenerating the entire video when only a specific element should change (causing unintended alterations), and removing objects without correcting the resulting physically implausible scene interactions.