3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation

Authors: Hansheng Chen, Bokui Shen, Yulin Liu, Ruoxi Shi, Linqi Zhou, Connor Z. Lin, Jiayuan Gu, Hao Su, Gordon Wetzstein, Leonidas Guibas

Abstract: Multi-view image diffusion models have significantly advanced open-domain 3D
object generation. However, most existing models rely on 2D network
architectures that lack inherent 3D biases, resulting in compromised geometric
consistency. To address this challenge, we introduce 3D-Adapter, a plug-in
module designed to infuse 3D geometry awareness into pretrained image diffusion
models. Central to our approach is the idea of 3D feedback augmentation: for
each denoising step in the sampling loop, 3D-Adapter decodes intermediate
multi-view features into a coherent 3D representation, then re-encodes the
rendered RGBD views to augment the pretrained base model through feature
addition. We study two variants of 3D-Adapter: a fast feed-forward version
based on Gaussian splatting and a versatile training-free version utilizing
neural fields and meshes. Our extensive experiments demonstrate that 3D-Adapter
not only greatly enhances the geometry quality of text-to-multi-view models
such as Instant3D and Zero123++, but also enables high-quality 3D generation
using the plain text-to-image Stable Diffusion. Furthermore, we showcase the
broad application potential of 3D-Adapter by presenting high quality results in
text-to-3D, image-to-3D, text-to-texture, and text-to-avatar tasks.

Source: http://arxiv.org/abs/2410.18974v1