A Unified Framework for Shot Type Classification Based on Subject Centric Lens
In film making, shot has a profound influence on how the story is delivered and how the audiences are echoed. As different scale and movement types of shots can express different emotions and contents, recognizing shots and their attributes is important to the understanding of movies as well as general videos. Classifying shot type is challenging due to the additional information required beyond the video content, such as the spatial composition of a frame and the video camera movement. To address these issues, we propose a learning framework Subject Guidance Network (SGNet) for shot type recognition. SGNet separates the subject and background of a shot into two streams, serving as maps to guide scale and movement type classification respectively. To facilitate shot type analysis and model evaluations, we build a large-scale dataset MovieShots, which contains 46K shots from 7K movie trailers with annotations of their scale and movement types. Experiments show that our framework is able to recognize these two attributes of shot accurately, outperforming all the previous methods. "