StableAnimator: High-Quality Identity-Preserving Human Image Animation

Authors: Shuyuan Tu, Zhen Xing, Xintong Han, Zhi-Qi Cheng, Qi Dai, Chong Luo, Zuxuan Wu

Abstract: Current diffusion models for human image animation struggle to ensure
identity (ID) consistency. This paper presents StableAnimator, the first
end-to-end ID-preserving video diffusion framework, which synthesizes
high-quality videos without any post-processing, conditioned on a reference
image and a sequence of poses. Building upon a video diffusion model,
StableAnimator contains carefully designed modules for both training and
inference striving for identity consistency. In particular, StableAnimator
begins by computing image and face embeddings with off-the-shelf extractors,
respectively and face embeddings are further refined by interacting with image
embeddings using a global content-aware Face Encoder. Then, StableAnimator
introduces a novel distribution-aware ID Adapter that prevents interference
caused by temporal layers while preserving ID via alignment. During inference,
we propose a novel Hamilton-Jacobi-Bellman (HJB) equation-based optimization to
further enhance the face quality. We demonstrate that solving the HJB equation
can be integrated into the diffusion denoising process, and the resulting
solution constrains the denoising path and thus benefits ID preservation.
Experiments on multiple benchmarks show the effectiveness of StableAnimator
both qualitatively and quantitatively.

Source: http://arxiv.org/abs/2411.17697v1

About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like these