Authors: Xin Fei, Wenzhao Zheng, Yueqi Duan, Wei Zhan, Masayoshi Tomizuka, Kurt Keutzer, Jiwen Lu
Abstract: We propose PixelGaussian, an efficient feed-forward framework for learning
generalizable 3D Gaussian reconstruction from arbitrary views. Most existing
methods rely on uniform pixel-wise Gaussian representations, which learn a
fixed number of 3D Gaussians for each view and cannot generalize well to more
input views. Differently, our PixelGaussian dynamically adapts both the
Gaussian distribution and quantity based on geometric complexity, leading to
more efficient representations and significant improvements in reconstruction
quality. Specifically, we introduce a Cascade Gaussian Adapter to adjust
Gaussian distribution according to local geometry complexity identified by a
keypoint scorer. CGA leverages deformable attention in context-aware
hypernetworks to guide Gaussian pruning and splitting, ensuring accurate
representation in complex regions while reducing redundancy. Furthermore, we
design a transformer-based Iterative Gaussian Refiner module that refines
Gaussian representations through direct image-Gaussian interactions. Our
PixelGaussian can effectively reduce Gaussian redundancy as input views
increase. We conduct extensive experiments on the large-scale ACID and
RealEstate10K datasets, where our method achieves state-of-the-art performance
with good generalization to various numbers of views. Code:
https://github.com/Barrybarry-Smith/PixelGaussian.
Source: http://arxiv.org/abs/2410.18979v1