Hierarchical Contrastive Inconsistency Learning for Deepfake Video Detection

Zhihao Gu, Taiping Yao, Yang Chen, Shouhong Ding, Lizhuang Ma ;


"With the rapid development of Deepfake techniques, the capacity of generating hyper-realistic faces has aroused public concerns in recent years. The temporal inconsistency which derives from the contrast of facial movements between pristine and forged videos can serve as an efficient cue in identifying Deepfakes. However, most existing approaches tend to impose binary supervision to model it, which restricts them to only focusing on the category-level discrepancies. In this paper, we propose a novel Hierarchical Contrastive Inconsistency Learning framework (HCIL) with a two-level contrastive paradigm. Specially, sampling multiply snippets to form the input, HCIL performs contrastive learning from both local and global perspectives to capture more general and intrinsical temporal inconsistency between real and fake videos. Moreover, we also incorporate a region-adaptive module for intra-snippet inconsistency mining and an inter-snippet fusion module for cross-snippet information fusion, which further facilitates the inconsistency learning. Extensive experiments and visualizations demonstrate the effectiveness of our method against SOTA competitors on four Deepfake video datasets, \emph{i.e.,} FaceForensics++, Celeb-DF, DFDC, and Wild-Deepfake."

Related Material

[pdf] [DOI]