Data Association between Event Streams and Intensity Frames under Diverse Baselines
"This paper proposes a learning-based framework to associate event streams and intensity frames under diverse camera baselines, to simultaneously benefit to camera pose estimation under large baseline and depth estimation under small baseline. Based on the observation that event streams are globally sparse (a small percentage of pixels in global frames are triggered with events) and locally dense (a large percentage of pixels in local patches are triggered with events) in the spatial domain, we put forward a two-stage architecture for matching feature maps. LSparse-Net uses a large receptive field to find sparse matches while SDense-Net uses a small receptive field to find dense matches. Both two stages apply transformer modules with self-attention layers and cross-attention layers to effectively process multi-resolution features from the feature pyramid network backbone. Experimental results on public datasets show systematic performance improvement for both tasks compared to state-of-the-art methods."