RegionCL: Exploring Contrastive Region Pairs for Self-Supervised Representation Learning

Yufei Xu, Qiming Zhang, Jing Zhang, Dacheng Tao ;


"Self-supervised methods (SSL) have achieved significant success via maximizing the mutual information between two augmented views, where cropping is a popular augmentation technique. Cropped regions are widely used to construct positive pairs, while the remained regions after cropping have rarely been explored in existing methods, although they together constitute the same image instance and both contribute to the description of the category. In this paper, we make the first attempt to demonstrate the importance of both regions in cropping from a complete perspective and the effectiveness of using both regions via designing a simple yet effective pretext task called Region Contrastive Learning (RegionCL). Technically, to construct the two kinds of regions, we randomly crop a region (called the paste view) from each input image with the same size and swap them between different images to compose new images together with the remained regions (called the canvas view). Then, instead of taking the new images as a whole for positive or negative samples, contrastive pairs are efficiently constructed from the regional perceptive based on the following simple criteria, i.e., each view is (1) positive with views augmented from the same original image and (2) negative with views augmented from other images. With minor modifications to popular SSL methods, RegionCL exploits those abundant pairs and helps the model distinguish the regions features from both canvas and paste views, therefore learning better visual representations. Experiments on ImageNet, MS COCO, and Cityscapes demonstrate that RegionCL improves MoCov2, DenseCL, and SimSiam by large margins and achieves state-of-the-art performance on classification, detection, and segmentation tasks."

Related Material

[pdf] [supplementary material] [DOI]