CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition
"Zero-Shot action recognition is the task of recognizing action classes without visual examples. The problem can be seen as learning a representation on seen classes which generalizes well to instances of unseen classes, without losing discriminability between classes. Neural networks are able to model highly complex boundaries between visual classes, which explains their success as supervised models. However, in Zero-Shot learning, these highly specialized class boundaries may overfit to the seen classes and not transfer well from seen to unseen classes. We propose a novel cluster-based representation, which regularizes the learning process, yielding a representation that generalizes well to instances from unseen classes. We optimize the clustering using reinforcement learning, which we observe is critical. We call the proposed method CLASTER and observe that it consistently outperforms the state-of-the-art in all standard Zero-Shot video datasets, including UCF101, HMDB51 and Olympic Sports; both in the standard Zero-Shot evaluation and the generalized Zero-Shot learning. We see improvements of up to 11.9% over SOTA."