StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN

Fei Yin, Yong Zhang, Xiaodong Cun, Mingdeng Cao, Yanbo Fan, Xuan Wang, Qingyan Bai, Baoyuan Wu, Jue Wang, Yujiu Yang ;


"One-shot talking face generation aims at synthesizing a high-quality talking face video from an arbitrary portrait image, driven by a video or an audio segment. In this work, we provide a solution from a novel perspective that differs from existing frameworks. We first investigate the latent feature space of a pre-trained StyleGAN and discover some excellent spatial transformation properties. Upon the observation, we propose a novel unified framework based on a pre-trained StyleGAN that enables a set of powerful functionalities, i.e., high-resolution video generation, disentangled control by driving video or audio, and flexible face editing. Our framework elevates the resolution of the synthesized talking face to 1024×1024 for the first time, even though the training dataset has a lower resolution. Moreover, our framework allows two types of facial editing, i.e., global editing via GAN inversion and intuitive editing via 3D morphable models. Comprehensive experiments show superior video quality and flexible controllability over state-of-the-art methods. Code is available at"

Related Material

[pdf] [supplementary material] [DOI]