Meta-GF: Training Dynamic-Depth Neural Networks Harmoniously
"Most state-of-the-art deep neural networks use static inference graphs, which makes it impossible for such networks to dynamically adjust the depth or width of the network according to the complexity of the input data. Different from these static models, depth-adaptive neural networks, e.g. the multi-exit networks, aim at improving the computation efficiency by conducting adaptive inference conditioned on the input. To achieve adaptive inference, multiple output exits are attached at different depths of the multi-exit networks. Unfortunately, these exits usually interfere with each other in the training stage. The interference would reduce performance of the models and cause negative influences on the convergence speed. To address this problem, we investigate the gradient conflict of these multi-exit networks, and propose a novel meta-learning based training paradigm namely Meta-GF(meta gradient fusion) to harmoniously train these exits. Different from existing approaches, Meta-GF takes account of the importances of the shared parameters to each exit, and fuses the gradients of each exit by the meta-learned weights. Experimental results on CIFAR and ImageNet verify the effectiveness of the proposed method. Furthermore, the proposed Meta-GF requires no modification on the network structures and can be directly combined with previous training techniques. The code is available at https://github.com/SYVAE/MetaGF."