Improving Self-Supervised Lightweight Model Learning via Hard-Aware Metric Distillation
"The performance of self-supervised learning (SSL) models is hindered by the scale of the network. Existing SSL methods suffer a precipitous drop in lightweight models, which is important for many mobile devices. To address this problem, we propose a method to improve the lightweight network (as student) via distilling the metric knowledge in a larger SSL model (as teacher). We exploit the relation between teacher and student to mine the positive and negative supervision from the unlabeled data, which captures more accurate supervision signals. To adaptively handle the uncertainty in positive and negative sample pairs, we incorporate a dynamic weighting strategy to the metric relation between embeddings. Different from previous self-supervised distillers, our solution directly optimizes the network from a metric transfer perspective by utilizing the relationships between samples and networks, without additional SSL constraints. Our method significantly boosts the performance of lightweight networks and outperforms existing distillers with fewer training epochs on the large-scale ImageNet. Interestingly, the SSL performance even beats the teacher network in several settings."