An Embedded Feature Whitening Approach to Deep Neural Network Optimization
"Compared with the feature normalization methods that are widely used in deep neural network (DNN) training, feature whitening methods take the correlation of features into consideration, which can help to learn more effective features. However, existing feature whitening methods have a few limitations, such as the large computation and memory cost, incapability to adopt pre-trained DNN models, the introduction of additional parameters, etc., making them impractical to use in optimizing DNNs. To overcome these drawbacks, we propose a novel Embedded Feature Whitening (EFW) approach to DNN optimization. EFW only adjusts the gradient of weight by using the whitening matrix without changing any part of the network so that it can be easily adopted to optimize pre-trained and well-defined DNN architectures. We consequently develop the associated momentum, adaptive damping and gradient norm recovery techniques w.r.t. EFW, which can be implemented efficiently with acceptable extra computation and memory cost. We apply EFW to the two most commonly used DNN optimizers, i.e., SGDM and Adam, and name them W-SGDM and W-Adam. Extensive experimental results on various vision tasks, including image classification, object detection, segmentation and person ReID, demonstrate the superiority of W-SGDM and W-Adam to their original counterparts."