DVI: Depth Guided Video Inpainting for Autonomous Driving
To get clear street-view and photo-realistic simulation in autonomous driving, we present an automatic video inpainting algorithm that can remove traffic agents from videos and synthesize missing regions with the guidance of depth/point cloud. By building a dense 3D map from stitched point clouds, frames within a video are geometrically correlated via this common 3D map. In order to fill a target inpainting area in a frame, it is straightforward to transform pixels from other frames into the current one with correct occlusion. Furthermore, we are able to fuse multiple videos through 3D point cloud registration, making it possible to inpaint a target video with multiple source videos. The motivation is to solve the long-time occlusion problem where an occluded area has never been visible in the entire video. This happens quite often for street view videos when there is a vehicle in the front. In this case, we can capture the scene a second time when the desired area becomes visible and do video fusion inpainting. To our knowledge, we are the first to fuse multiple videos for video inpainting. In order to evaluate our method, we collected 5 hour synchronized lidar and camera data from autonomous driving cars in the urban roads. To address long-time occlusion problems, we collect data for the same road in different days and times. The experimental results show that the proposed approach outperforms the state-of-the-art approaches for all the criteria, especially the RMSE (Root Mean Squared Error) has been reduced by about %13."