Authors: Wenzhao Zheng, Junjie Wu, Yao Zheng, Sicheng Zuo, Zixun Xie, Longchao Yang, Yong Pan, Zhihui Hao, Peng Jia, Xianpeng Lang, Shanghang Zhang
Abstract: Vision-based autonomous driving shows great potential due to its satisfactory
performance and low costs. Most existing methods adopt dense representations
(e.g., bird’s eye view) or sparse representations (e.g., instance boxes) for
decision-making, which suffer from the trade-off between comprehensiveness and
efficiency. This paper explores a Gaussian-centric end-to-end autonomous
driving (GaussianAD) framework and exploits 3D semantic Gaussians to
extensively yet sparsely describe the scene. We initialize the scene with
uniform 3D Gaussians and use surrounding-view images to progressively refine
them to obtain the 3D Gaussian scene representation. We then use sparse
convolutions to efficiently perform 3D perception (e.g., 3D detection, semantic
map construction). We predict 3D flows for the Gaussians with dynamic semantics
and plan the ego trajectory accordingly with an objective of future scene
forecasting. Our GaussianAD can be trained in an end-to-end manner with
optional perception labels when available. Extensive experiments on the widely
used nuScenes dataset verify the effectiveness of our end-to-end GaussianAD on
various tasks including motion planning, 3D occupancy prediction, and 4D
occupancy forecasting. Code: https://github.com/wzzheng/GaussianAD.
Source: http://arxiv.org/abs/2412.10371v1