VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control

Authors: Yuxuan Bian, Zhaoyang Zhang, Xuan Ju, Mingdeng Cao, Liangbin Xie, Ying Shan, Qiang Xu

Abstract: Video inpainting, which aims to restore corrupted video content, has
experienced substantial progress. Despite these advances, existing methods,
whether propagating unmasked region pixels through optical flow and receptive
field priors, or extending image-inpainting models temporally, face challenges
in generating fully masked objects or balancing the competing objectives of
background context preservation and foreground generation in one model,
respectively. To address these limitations, we propose a novel dual-stream
paradigm VideoPainter that incorporates an efficient context encoder
(comprising only 6% of the backbone parameters) to process masked videos and
inject backbone-aware background contextual cues to any pre-trained video DiT,
producing semantically consistent content in a plug-and-play manner. This
architectural separation significantly reduces the model’s learning complexity
while enabling nuanced integration of crucial background context. We also
introduce a novel target region ID resampling technique that enables any-length
video inpainting, greatly enhancing our practical applicability. Additionally,
we establish a scalable dataset pipeline leveraging current vision
understanding models, contributing VPData and VPBench to facilitate
segmentation-based inpainting training and assessment, the largest video
inpainting dataset and benchmark to date with over 390K diverse clips. Using
inpainting as a pipeline basis, we also explore downstream applications
including video editing and video editing pair data generation, demonstrating
competitive performance and significant practical potential. Extensive
experiments demonstrate VideoPainter’s superior performance in both any-length
video inpainting and editing, across eight key metrics, including video
quality, mask region preservation, and textual coherence.

Source: http://arxiv.org/abs/2503.05639v1

About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like these