ECVA | European Computer Vision Association

HMOR: Hierarchical Multi-Person Ordinal Relations for Monocular Multi-Person 3D Pose Estimation

Can Wang, Jiefeng Li, Wentao Liu, Chen Qian, Cewu Lu ;

Abstract

Remarkable progress has been made in 3D human pose estimation from a monocular RGB camera. However, only a few studies explored 3D multi-person cases. In this paper, we attempt to address the lack of a global perspective of the top-down approaches by introducing a novel form of supervision - Hierarchical Multi-person Ordinal Relations (HMOR). The HMOR encodes interaction information as the ordinal relations of depths and angles hierarchically, which captures the body-part and joint level semantic and maintains global consistency at the same time. In our approach, an integrated top-down model is designed to leverage these ordinal relations in the learning process. The integrated model estimates human bounding boxes, human depths, and root-relative 3D poses simultaneously, with a coarse-to-fine architecture to improve the accuracy of depth estimation. The proposed method significantly outperforms state-of-the-art methods on publicly available multi-person 3D pose datasets (9.2 mm improvement on 3DPW dataset, 12.3 PCK improvement on MuPoTS-3D dataset, and 20.5 mm improvement on CMU Panoptic dataset). In addition to superior performance, our method costs lower computation complexity and fewer model parameters. Our code will be made publicly available."

Related Material

[pdf]