ECVA | European Computer Vision Association

I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image

Gyeongsik Moon, Kyoung Mu Lee ;

Abstract

Most of the previous image-based 3D human pose and mesh estimation methods estimate parameters of the human mesh model from an input image. However, directly regressing the parameters from the input image is a highly non-linear mapping because it breaks the spatial relationship between pixels in the input image. In addition, it cannot model the prediction uncertainty, which can make training harder. To resolve the above issues, we propose I2L-MeshNet, an image-to-lixel (line+pixel) prediction network. The proposed I2L-MeshNet predicts the per-lixel likelihood on 1D heatmaps for each mesh vertex coordinate instead of directly regressing the parameters. Our lixel-based 1D heatmap preserves the spatial relationship in the input image and models the prediction uncertainty. We show that the proposed I2L-MeshNet significantly outperforms previous methods while providing visually pleasant mesh estimation results. The code is publicly available ootnote{\url{https://github.com/mks0601/I2L-MeshNet_RELEASE}}."

Related Material

[pdf]