ECVA | European Computer Vision Association

GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision

Lei Ke, Shichao Li, Yanan Sun, Yu-Wing Tai, Chi-Keung Tang ;

Abstract

We present a novel end-to-end framework named as GSNet ( extbf{\underline{G}}eometric and extbf{\underline{S}}cene-aware \underline{ extbf{Net}}work), which jointly estimates 6DoF poses and reconstructs detailed 3D car shapes from single urban street view. GSNet utilizes a unique four-way feature extraction and fusion scheme and directly regresses 6DoF poses and shapes in a single forward pass. Extensive experiments show that our diverse feature extraction and fusion scheme can greatly improve model performance. Based on a divide-and-conquer 3D shape representation strategy, GSNet reconstructs 3D vehicle shape with great detail (1352 vertices and 2700 faces). This dense mesh representation further leads us to consider geometrical consistency and scene context, and inspires a new multi-objective loss function to regularize network training, which in turn improves the accuracy of 6D pose estimation and validates the merit of jointly performing both tasks. We evaluate GSNet on the largest multi-task ApolloCar3D benchmark and achieve state-of-the-art performance both quantitatively and qualitatively."

Related Material

[pdf]