Clustering Driven Deep Autoencoder for Video Anomaly Detection
Because of the ambiguous definition of anomaly and the complexity of real data, anomaly detection in videos is one of the most challenging problems in intelligent video surveillance. Since the abnormal events are usually different from normal events in appearance and/or in motion behavior, we address this issue by designing a novel convolution autoencoder architecture to separately capture spatial and temporal informative representation. The spatial part reconstructs the last individual frame (LIF), and the temporal part generates the RGB difference between the rest of video frames and the LIF, where the fast obtained RGB difference cue can learn useful motion features. Two sub-modules independently learn the regularity from appearance and motion feature space, the abnormal events which are irregular in appearance or in motion behavior lead to a large reconstruction error. Besides, we design a deep k-means cluster constraint to force both the appearance encoder and the motion encoder to extract common factors of variation within the dataset by penalizing the distance of each data representation to cluster centers. Experiments on some publicly available datasets demonstrate the effectiveness of our method which detects abnormal events in videos with competitive performance."