Compound Prototype Matching for Few-Shot Action Recognition
"Few-shot action recognition aims to recognize novel action classes using only a small number of labeled training samples. In this work, we propose a novel approach that first summarizes each video into compound prototypes consisting of a group of global prototypes and a group of focused prototypes, and then compares video similarity based on the prototypes. Each global prototype is encouraged to summarize a specific aspect from the entire video, for example, the start of the action or the evolution of the action. Since no clear annotation is provided for the global prototypes, we use a group of focused prototypes and to focus on certain timestamps in the video. We compare similarity by matching the compound prototypes between the support and query videos. The global prototypes are directly matched so that the actions can be compared from the same perspective, for example, whether two actions start similarly. For the focused prototypes, since actions have various temporal shifts in the videos, we apply bipartite matching to allow comparison of the same action on different timestamps. Extensive experiments demonstrate that our proposed method achieves state-of-the-art results on multiple benchmarks by a large margin. A detailed ablation study analyzes the importance of each group of prototypes in capturing different aspects of the video."