Phase Concentration and Shortcut Suppression for Weakly Supervised Semantic Segmentation

Hoyong Kwon, Jaeseok Jeong, Sung-Hoon Yoon, Kuk-Jin Yoon* ;

Abstract


"Weakly Supervised Semantic Segmentation (WSSS) with image-level supervision typically acquires object localization information from Class Activation Maps (CAMs). While Vision Transformers (ViTs) in WSSS have been increasingly explored for their superior performance in understanding global context, CAMs from ViT still show imprecise localization in boundary areas and false positive activation. This paper proposes a novel WSSS framework that targets these issues based on the information from the frequency domain. In our framework, we introduce the Magnitude-mixing-based Phase Concentration (MPC) module, which guides the classifier to prioritize phase information containing high-level semantic details. By perturbing and mixing the magnitude, MPC guides the classifier to accentuate and concentrate on the shape information in the phase, thereby leading to finer distinctions in CAMs boundary regions. Additionally, inspired by empirical observations that the classification ”shortcut” in the frequency domain can induce false positives in CAMs, we introduce a Frequency Shortcut Suppression (FSS) module. This module aims to discourage the formation of such shortcuts, thereby mitigating false positives. The effectiveness of our approach is demonstrated by achieving new state-of-the-art performance on both PASCAL VOC 2012 and MS COCO 2014 datasets. The code is available at https://github.com/kwonhoyong3/PCSS-WSSS."

Related Material


[pdf] [supplementary material] [DOI]