Resolution Switchable Networks for Runtime Efficient Image Recognition
We propose a general method to train a single convolutional neural network which is capable of switching image resolutions at inference. Thus the running speed can be selected to meet various computational resource limits. Networks trained with the proposed method are named Resolution Switchable Networks (RS-Nets). The basic training framework shares network parameters for handling images which differ in resolution, yet keeps separate batch normalization layers. Though it is parameter-efficient in design, it leads to inconsistent accuracy variations at different resolutions, for which we provide a detailed analysis from the aspect of the train-test recognition discrepancy. A multi-resolution ensemble distillation is further designed, where a teacher is learnt on the fly as a weighted ensemble over resolutions. Thanks to the ensemble and knowledge distillation, RS-Nets enjoy accuracy improvements at a wide range of resolutions compared with individually trained models. Extensive experiments on the ImageNet dataset are provided, and we additionally consider quantization problems. Code and models are available at https://github.com/yikaiw/RS-Nets"