Disentangled Differentiable Network Pruning
"In this paper, we propose a novel channel pruning method for compression and acceleration of Convolutional Neural Networks (CNNs). Many existing channel pruning works try to discover compact sub-networks by optimizing a regularized loss function through differentiable operations. Usually, a learnable parameter is used to characterize each channel, which entangles the width and channel importance. In this setting, the FLOPs or parameter constraints implicitly restrict the search space of the pruned model. To solve the aforementioned problems, we propose optimizing each layer’s width by relaxing the hard equality constraint used in previous works. The relaxation is inspired by the definition of the top-$k$ operation. By doing so, we partially disentangle the learning of width and channel importance, which enables independent parametrization for width and importance and makes pruning more flexible. We also introduce soft top-$k$ to improve the learning of width. Moreover, to make pruning more efficient, we use two neural networks to parameterize the importance and width. The importance generation network considers both inter-channel and inter-layer relationships. The width generation network has similar functions. In addition, our method can be easily optimized by popular SGD methods, which enjoys the benefits of differentiable pruning. Extensive experiments on CIFAR-10 and ImageNet show that our method is competitive with state-of-the-art methods."