BASQ: Branch-Wise Activation-Clipping Search Quantization for Sub-4-Bit Neural Networks
"In this paper, we propose Branch-wise Activation-clipping Search Quantization (BASQ), which is a novel quantization method for low-bit activation. BASQ optimizes clip value in continuous search space while simultaneously searching L2 decay weight factor for updating clip value in discrete search space. We also propose a novel block structure for low precision that works properly on both MobileNet and ResNet structures with branch-wise searching. We evaluate the proposed methods by quantizing both weights and activations to 4-bit or lower. Contrary to the existing methods which are effective only for redundant networks, e.g., ResNet-18, or highly optimized networks, e.g., MobileNet-v2, our proposed method offers constant competitiveness on both types of networks across low precisions from 2 to 4-bits. Specifically, our 2-bit MobileNet-v2 offers top-1 accuracy of 64.71% on ImageNet, outperforming the existing method by a large margin (2.8%), and our 4-bit MobileNet-v2 gives 71.98% which is comparable to the full-precision accuracy 71.88% while our uniform quantization method offers comparable accuracy of 2-bit ResNet-18 to the state-of-the-art non-uniform quantization method."