Toward Faster and Simpler Matrix Normalization via Rank-1 Update
Bilinear pooling has achieved an impressive improvement in many computer vision tasks. Recent studies discover that matrix normalization is vital for improving the performance of bilinear pooling. Nevertheless, traditional matrix square-root or logarithm normalization methods need singular value decomposition (SVD), which is not well suited in the GPU platform, limiting its efficiency in training and inference. To boost the efficiency in the GPU platform, recent methods rely on Newton-Schulz (NS) iteration to approximate the matrix square-root. Despite NS iteration is GPU-friendly, it takes several times matrix-matrix multiplications, which is still expensive in computation. Besides, the proposed RUN is much simpler than NS iteration and thus much easier for implementation in practice. Meanwhile, NS iteration can not well support the compact bilinear feature obtained from tensor sketch or random projection. To overcome these limitations, we propose a rank-1 update normalization (RUN), which only needs matrix-vector multiplications and thus is significantly more efficient than NS iteration using matrix-matrix multiplications. Moreover, our RUN readily supports the normalization on compact bilinear features, making it more flexible to be deployed compared with NS iteration. The proposed RUN is differentiable and can be plugged in a neural network for an end-to-end training. Comprehensive experiments on four public benchmarks show that, for the full bilinear pooling, the proposed RUN achieves comparable accuracies with a $330 imes$ speedup over NS iteration. For the compact bilinear pooling, our RUN achieves comparable accuracies with a $5400 imes$ speedup over SVD-based normalization."