ECVA | European Computer Vision Association

Regularizing Vector Embedding in Bottom-Up Human Pose Estimation

Haixin Wang, Lu Zhou, Yingying Chen, Ming Tang, Jinqiao Wang ;

Abstract

"The embedding-based method such as Associative Embedding is popular in bottom-up human pose estimation. Methods under this framework group candidate keypoints according to the predicted identity embeddings. However, the identity embeddings of different instances are likely to be linearly inseparable in some complex scenes, such as crowded scene or when the number of instances in the image is large. To reduce the impact of this phenomenon on keypoint grouping, we try to learn a sparse multidimensional embedding for each keypoint. We observe that the different dimensions of embeddings are highly linearly correlated. To address this issue, we impose an additional constraint on the embeddings during training phase. Based on the fact that the scales of instances usually have significant variations, we uilize the scales of instances to regularize the embeddings, which effectively reduces the linear correlation of embeddings and makes embeddings being sparse. We evaluate our model on CrowdPose Test and COCO Test-dev. Compared to vanilla Associative Embedding, our method has an impressive superiority in keypoint grouping, especially in crowded scenes with a large number of instances. Furthermore, our method achieves state-of-the-art results on CrowdPose Test (74.5 AP) and COCO Test-dev (72.8 AP), outperforming other bottom-up methods. Our code and pretrained models are available at https://github.com/CR320/CoupledEmbedding."

Related Material

[pdf] [DOI]