Self-Regulated Feature Learning via Teacher-Free Feature Distillation
"Knowledge distillation conditioned on intermediate feature representations always leads to significant performance improvements. Conventional feature distillation framework demands extra selecting/training budgets of teachers and complex transformations to align the features between teacher-student models. To address the problem, we analyze teacher roles in feature distillation and have an intriguing observation: additional teacher architectures are not always necessary. Then we propose Tf-FD, a simple yet effective Teacher-free Feature Distillation framework, reusing channel-wise and layer-wise meaningful features within the student to provide teacher-like knowledge without an additional model. In particular, our framework is subdivided into intra-layer and inter-layer distillation. The intra-layer Tf-FD performs feature salience ranking and transfers the knowledge from salient feature to redundant feature within the same layer. For inter-layer Tf-FD, we deal with distilling high-level semantic knowledge embedded in the deeper layer representations to guide the training of shallow layers. Benefiting from the small gap between these self-features, Tf-FD simply needs to optimize extra feature mimicking losses without complex transformations. Furthermore, we provide insightful discussions to shed light on Tf-FD from feature regularization perspectives. Our experiments conducted on classification and object detection tasks demonstrate that our technique achieves state-of-the-art results on different models with fast training speeds. Code is available at https://lilujunai.github.io/Teacher-free-Distillation/."