ECVA | European Computer Vision Association

Adaptive Agent Transformer for Few-Shot Segmentation

Yuan Wang, Rui Sun, Zhe Zhang, Tianzhu Zhang ;

Abstract

"Few-shot segmentation (FSS) aims to segment objects in a given query image with only a few labelled support images. The limited support information makes it an extremely challenging task. Most previous best-performing methods adopt prototypical learning or affinity learning. Nevertheless, they either neglect to further utilize support pixels for facilitating segmentation and lose spatial information, or are not robust to noisy pixels and computationally expensive. In this work, we propose a novel end-to-end adaptive agent transformer (AAFormer) to integrate prototypical and affinity learning to exploit the complementarity between them via a transformer encoder-decoder architecture, including a representation encoder, an agent learning decoder and an agent matching decoder. The proposed AAFormer enjoys several merits. First, to learn agent tokens well without any explicit supervision, and to make agent tokens capable of dividing different objects into diverse parts in an adaptive manner, we customize the agent learning decoder according to the three characteristics of context awareness, spatial awareness and diversity. Second, the proposed agent matching decoder is responsible for decomposing the direct pixel-level matching matrix into two more computationally-friendly matrices to suppress the noisy pixels. Extensive experimental results on two standard benchmarks demonstrate that our AAFormer performs favorably against state-of-the-art FSS methods."

Related Material

[pdf] [supplementary material] [DOI]