Video Object Segmentation with Episodic Graph Memory Networks
How to make a segmentation model efficiently adapt to a specific video as well as online target appearance variations is a fun- damental issue in the field of video object segmentation. In this work, a graph memory network is developed to address the novel idea of “learning to update the segmentation model”. Specifically, we exploit an episodic memory network, organized as a fully connected graph, to store frames as nodes and capture cross-frame correlations by edges. Further, learnable controllers are embedded to ease memory reading and writing, as well as maintain a fixed memory scale. The structured, external memory design enables our model to comprehensively mine and quickly store new knowl- edge, even with limited visual information, and the differentiable memory controllers slowly learn an abstract method for storing useful represen- tations in the memory and how to later use these representations for prediction, via gradient descent. In addition, the proposed graph mem- ory network yields a neat yet principled framework, which can generalize well to both one-shot and zero-shot video object segmentation tasks. Ex- tensive experiments on four challenging benchmark datasets verify that our graph memory network is able to facilitate the adaptation of the segmentation network for case-by-case video object segmentation.