Scale Aggregation Network for Accurate and Efficient Crowd Counting

Xinkun Cao, Zhipeng Wang, Yanyun Zhao, Fei Su; The European Conference on Computer Vision (ECCV), 2018, pp. 734-750


In this paper, we propose a novel encoder-decoder network, called extit{Scale Aggregation Network (SANet)}, for accurate and efficient crowd counting. The encoder extracts multi-scale features with scale aggregation modules and the decoder generates high-resolution density maps by using a set of transposed convolutions. Moreover, we find that most existing works use only Euclidean loss which assumes independence among each pixel but ignores the local correlation in density maps. Therefore, we propose a novel training loss, combining of Euclidean loss and local pattern consistency loss, which improves the performance of the model in our experiments. In addition, we use normalization layers to ease the training process and apply a patch-based test scheme to reduce the impact of statistic shift problem. To demonstrate the effectiveness of the proposed method, we conduct extensive experiments on four major crowd counting datasets and our method achieves superior performance to state-of-the-art methods while with much less parameters.

Related Material

author = {Cao, Xinkun and Wang, Zhipeng and Zhao, Yanyun and Su, Fei},
title = {Scale Aggregation Network for Accurate and Efficient Crowd Counting},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}