Cross-Modality Transformer for Visible-Infrared Person Re-identification
"Visible-infrared person re-identification (VI-ReID) is a challenging task due to the large cross-modality discrepancies and intra-class variations. Existing works mainly focus on learning modality-shared representations by embedding different modalities into the same feature space. However, these methods usually damage the modality-specific information and identification information contained in the features. To alleviate the above issues, we propose a novel Cross-Modality Transformer (CMT) to jointly explore a modality-level alignment module and an instance-level module for VI-ReID. The proposed CMT enjoys several merits. First, the modality-level alignment module is designed to compensate for the missing modality-specific information via a Transformer encoder-decoder architecture. Second, we propose an instance-level alignment module to adaptively adjust the sample features, which is achieved by a query-adaptive feature modulation. To the best of our knowledge, this is the first work to exploit a cross-modality transformer to achieve the modality compensation for VI-ReID. Extensive experimental results on two standard benchmarks demonstrate that our CMT performs favorably against the state-of-the-art methods."