ECVA | European Computer Vision Association

VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment

Hanyue Tu, Chunyu Wang, Wenjun Zeng ;

Abstract

We present mph{VoxelPose} to estimate $3$D poses of multiple people from multiple camera views. In contrast to the previous efforts which require to establish cross-view correspondence based on noisy and incomplete $2$D pose estimates, mph{VoxelPose} directly operates in the $3$D space therefore avoids making incorrect decisions in each camera view. To achieve this goal, features in all camera views are aggregated in the $3$D voxel space and fed into mph{Cuboid Proposal Network} (CPN) to localize all people. Then we propose mph{Pose Regression Network} (PRN) to estimate a detailed $3$D pose for each proposal. The approach is robust to occlusion which occurs frequently in practice. Without bells and whistles, it outperforms the previous methods on several public datasets."

Related Material

[pdf]