A Fast Knowledge Distillation Framework for Visual Recognition

Zhiqiang Shen, Eric Xing ;


"While Knowledge Distillation (KD) has been recognized as a useful tool in many visual tasks, such as supervised classification and self-supervised representation learning, the main drawback of a vanilla KD framework is its mechanism that consumes the majority of the computational overhead on forwarding through the giant teacher networks, making the entire learning procedure inefficient and costly. The recently proposed solution ReLabel suggests creating a label map for the entire image. During training, it receives the cropped region-level label by RoI aligning on a pre-generated entire label map, which allows for efficient supervision generation without having to pass through the teachers repeatedly. However, as the pre-trained teacher employed in ReLabel is from the conventional multi-crop scheme, there are various mismatches between the global label-map and region-level labels in this technique, resulting in performance deterioration compared to the vanilla KD. In this study, we present a Fast Knowledge Distillation (FKD) framework that replicates the distillation training phase and generates soft labels using the multi-crop KD approach, meanwhile training faster than ReLabel since no post-processes such as RoI align and softmax operations are used. When conducting multi-crop in the same image for data loading, our FKD is even more efficient than the traditional image classification framework. On ImageNet-1K, we obtain 80.1% Top-1 accuracy on ResNet-50, outperforming ReLabel by 1.2% while being faster in training and more flexible to use. On the distillation-based self-supervised learning task, we also show that FKD has an efficiency advantage. Code and models are available at: https://github.com/szq0214/FKD."

Related Material

[pdf] [supplementary material] [DOI]