VP-SAM: Taming Segment Anything Model for Video Polyp Segmentation via Disentanglement and Spatio-temporal Side Network

Zhixue Fang, Yuzhi Liu, Huisi Wu*, Jing Qin ;

Abstract


"We propose a novel model (VP-SAM) adapted from segment anything model (SAM) for video polyp segmentation (VPS), which is a challenging task due to (1) the low contrast between polyps and background and (2) the large frame-to-frame variations of polyp size, position, and shape. Our aim is to take advantage of the powerful representation capability of SAM while enabling SAM to effectively harness temporal information of colonoscopic videos and disentangle polyps from background with similar appearances. To achieve this, we propose two new techniques. First, we propose a new semantic disentanglement adapter (SDA) by exploiting amplitude information of the Fourier spectrum to facilitate SAM in more effectively differentiating polyps from background. Second, we propose an innovative spatio-temporal side network (STSN) to provide SAM with spatio-temporal information of videos, thus facilitating SAM in effectively tracking the motion status of polyps. Extensive experiments on SUN-SEG, CVC-612, and CVC-300 demonstrate that our method outperforms state-of-the-art methods. While this work focuses on colonoscopic videos, the proposed method is general enough to be used to analyze other medical videos with similar challenges. Code is available at https://github.com/zhixue-fang/ VPSAM."

Related Material


[pdf] [DOI]