Solving the Blind Perspective-n-Point Problem End-To-End With Robust Differentiable Geometric Optimization
Blind Perspective-n-Point (PnP) is the problem of estimating the position and orientation of a camera relative to a scene, given 2D image points and 3D scene points, without prior knowledge of the 2D-3D correspondences. Solving for pose and correspondences simultaneously is extremely challenging since the search space is very large. Fortunately it is a coupled problem: the pose can be found easily given the correspondences and vice versa. Existing approaches assume that noisy correspondences are provided, that a good pose prior is available, or that the problem size is small. We instead propose the first fully end-to-end trainable network for solving the blind PnP problem efficiently and globally, that is, without the need for pose priors. We make use of recent results in differentiating optimization problems to incorporate geometric model fitting into an end-to-end learning framework, including Sinkhorn, RANSAC and PnP algorithms. Our proposed approach significantly outperforms other methods on synthetic and real data.