WeLSA: Learning to Predict 6D Pose from Weakly Labeled Data Using Shape Alignment
"Object pose estimation is a crucial task in computer vision and augmented reality. One of its key challenges is the difficulty of annotation of real training data and the lack of textured CAD models. Therefore, pipelines which do not require CAD models and which can be trained with few labeled images are desirable. We propose a weakly-supervised approach for object pose estimation from RGB-D data using training sets composed of very few labeled images with pose annotations along with weakly-labeled images with ground truth segmentation masks without pose labels. We achieve this by learning to annotate weakly-labeled training data through shape alignment while simultaneously training a pose prediction network. Point cloud alignment is performed using structure and rotation-invariant feature-based losses. We further learn an implicit shape representation, which allows the method to work without the known CAD model and also contributes to pose alignment and pose refinement during training on weakly labeled images. The experimental evaluation shows that our method achieves state-of-the-art results on LineMOD, Occlusion-LineMOD and TLess despite being trained using relative poses and on only a fraction of labeled data used by the other methods. We also achieve comparable results to state-of-the-art RGB-D based pose estimation approaches even when further reducing the amount of unlabeled training data. In addition, our method works even if relative camera poses are given instead of object pose annotations which are typically easier to obtain."