Sequential Multi-View Fusion Network for Fast LiDAR Point Motion Estimation

Gang Zhang, Xiaoyan Li, Zhenhua Wang ;


"The LiDAR point motion estimation, including motion state prediction and velocity estimation, is crucial for understanding a dynamic scene in autonomous driving. Recent 2D projection-based methods run in real-time by applying the well-optimized 2D convolution networks on either the bird’s-eye view (BEV) or the range view (RV) but suffer from lower accuracy due to information loss during the 2D projection. Thus, we propose a novel sequential multi-view fusion network (SMVF), composed of a BEV branch and an RV branch, in charge of encoding the motion information and spatial information, respectively. By looking from distinct views and integrating with the original LiDAR point features, the SMVF produces a comprehensive motion prediction, while keeping its efficiency. Moreover, to generalize the motion estimation well to the objects with fewer training samples, we propose a sequential instance copy-paste (SICP) for generating realistic LiDAR sequences for these objects. The experiments on the SemanticKITTI moving object segmentation (MOS) and Waymo scene flow benchmarks demonstrate that our SMVF outperforms all existing methods by a large margin. \keywords{Motion State Prediction, Velocity Estimation, Multi-View Fusion, Generalization of Motion Estimation}"

Related Material

[pdf] [supplementary material] [DOI]