Open-World Stereo Video Matching with Deep RNN

Yiran Zhong, Hongdong Li, Yuchao Dai; The European Conference on Computer Vision (ECCV), 2018, pp. 101-116


In this paper, we propose a novel deep Recurrent Neural network (RNN) that takes a continuous (possibly previously unseen) stereo video as input, and directly predict a depth-map without of any pre-training process. The quality and accuracy of the obtained depth-map improves over time as new stereo frames being fed in. Thanks to the recurrent nature (based on two convolutional LSTM blocks) the network is able to memorize and learn from its past experience and gradually adapts its parameters (interconnection weights) to achieve better stereo matching result on the current stereo input. In this sense, our new method is an unsupervised network which does not rely on any labeled ground-truth depthmap, and it is able to work in a previously unseen or unfamiliar environments, suggesting a remarkable generaliability: it is applicable in an {em open-world} setting, by adapting its network parameters to generic stereo video inputs to be robust to changes in scene content, statistics, and lighting and season etc. Through extensive experiments, we demonstrate the method is able to seamlessly adapt between different scenarios. Also importantly, in terms of absolute stereo matching performance, it even outperforms the state of the art stereo algorithms on several standard benchmark datasets such as KITTI and Middlebury stereo.

Related Material

author = {Zhong, Yiran and Li, Hongdong and Dai, Yuchao},
title = {Open-World Stereo Video Matching with Deep RNN},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}