Unifying Visual Contrastive Learning for Object Recognition from a Graph Perspective

Shixiang Tang, Feng Zhu, Lei Bai, Rui Zhao, Chenyu Wang, Wanli Ouyang ;


"Recent contrastive based unsupervised object recognition methods leverage a Siamese architecture, which has two branches composed of a backbone, a projector layer, and an optional predictor layer in each branch. To learn the parameters of the backbone, existing methods have a similar projector layer design, while the major difference among them lies in the predictor layer. In this paper, we propose to \underline{Uni}fy existing unsupervised \underline{V}isual \underline{C}ontrastive \underline{L}earning methods by using a GCN layer as the predictor layer (UniVCL), which deserves two merits to unsupervised learning in object recognition. First, by treating different designs of predictors in the existing methods as its special cases, our fair and comprehensive experiments reveal the critical importance of neighborhood aggregation in the GCN predictor. Second, by viewing the predictor from the graph perspective, we can bridge the vision self-supervised learning with the graph representation learning area, which facilitates us to introduce the augmentations from the graph representation learning to unsupervised object recognition and further improves the unsupervised object recognition accuracy. Extensive experiments on linear evaluation and the semi-supervised learning tasks demonstrate the effectiveness of UniVCL and the introduced graph augmentations. Code will be released upon acceptance."

Related Material

[pdf] [supplementary material] [DOI]