ECVA | European Computer Vision Association

Towards Comprehensive Representation Enhancement in Semantics-Guided Self-Supervised Monocular Depth Estimation

Jingyuan Ma, Xiangyu Lei, Nan Liu, Xian Zhao, Shiliang Pu ;

Abstract

"Semantics-guided self-supervised monocular depth estimation has been widely researched, owing to the strong cross-task correlation of depth and semantics. However, since depth estimation and semantic segmentation are fundamentally two types of tasks: one is regression while the other is classification, the distribution of depth feature and semantic feature are naturally different. Previous works that leverage semantic information in depth estimation mostly neglect such representational discrimination, which leads to insufficient representation enhancement of depth feature. In this work, we propose an attention-based module to enhance task-specific feature by addressing their feature uniqueness within instances. Additionally, we propose a metric learning based approach to accomplish comprehensive enhancement on depth feature by creating a separation between instances in feature space. Extensive experiments and analysis demonstrate the effectiveness of our proposed method. In the end, our method achieves the state-of-the-art performance on KITTI dataset."

Related Material

[pdf] [supplementary material] [DOI]