Authors: Sharath Girish, Tianye Li, Amrita Mazumdar, Abhinav Shrivastava, David Luebke, Shalini De Mello
Abstract: Online free-viewpoint video (FVV) streaming is a challenging problem, which
is relatively under-explored. It requires incremental on-the-fly updates to a
volumetric representation, fast training and rendering to satisfy real-time
constraints and a small memory footprint for efficient transmission. If
achieved, it can enhance user experience by enabling novel applications, e.g.,
3D video conferencing and live volumetric video broadcast, among others. In
this work, we propose a novel framework for QUantized and Efficient ENcoding
(QUEEN) for streaming FVV using 3D Gaussian Splatting (3D-GS). QUEEN directly
learns Gaussian attribute residuals between consecutive frames at each
time-step without imposing any structural constraints on them, allowing for
high quality reconstruction and generalizability. To efficiently store the
residuals, we further propose a quantization-sparsity framework, which contains
a learned latent-decoder for effectively quantizing attribute residuals other
than Gaussian positions and a learned gating module to sparsify position
residuals. We propose to use the Gaussian viewspace gradient difference vector
as a signal to separate the static and dynamic content of the scene. It acts as
a guide for effective sparsity learning and speeds up training. On diverse FVV
benchmarks, QUEEN outperforms the state-of-the-art online FVV methods on all
metrics. Notably, for several highly dynamic scenes, it reduces the model size
to just 0.7 MB per frame while training in under 5 sec and rendering at 350
FPS. Project website is at https://research.nvidia.com/labs/amri/projects/queen
Source: http://arxiv.org/abs/2412.04469v1