GTCaR: Graph Transformer for Camera Re-Localization
"Camera re-localization or absolute pose regression is the centerpiece in numerous computer vision tasks such as visual odometry, structure from motion (SfM) and SLAM. In this paper we propose a neural network approach with a graph Transformer backbone, namely GTCaR (Graph Transformer for Camera Re-localization), to address the multi-view camera re-localization problem. In contrast with prior work where the pose regression is mainly guided by photometric consistency, GTCaR effectively fuses the image features, camera pose information and inter-frame relative camera motions into encoded graph attributes and is trained towards the graph consistency and pose accuracy combined instead, yielding significantly higher computational efficiency. By leveraging graph Transformer layers with edge features and enabling the adjacency tensor, GTCaR dynamically captures the global attention and thus endows the pose graph with evolving structures to achieve improved robustness and accuracy. In addition, optional temporal Transformer layers actively enhance the spatiotemporal inter-frame relation for sequential inputs. Evaluation of the proposed network on various public benchmarks demonstrates that GTCaR outperforms state-of-the-art approaches."