MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation
Knowledge Distillation (KD) has been one of the most popular used methods to learn a compact model. However, it still suffers from high demand in time and computational resources caused by sequential training pipeline. Furthermore, the soft targets from deeper models do not often serve as good cues for the shallower models due to the gap of compatibility. In this work, we consider these two problems at the sametime. Specifically, we propose that better soft targets with higher compatibility can be generated by using a label generator to fuse the featuremaps from deeper stages in a top-down manner, and we can employ the meta-learning technique to optimize this label generator. Utilizing the soft targets learned from the intermediate feature maps of the model, we can achieve better self-boosting of the network in comparison with the state-of-the-art. The experiments are conducted on two standard classi-fication benchmarks, namely CIFAR-100 and ILSVRC2012. We test various network architectures to show the generalizability of our MetaDistiller. The experiments results on two datasets strongly demonstrate theeffectiveness of our method."