Folded Recurrent Neural Networks for Future Video Prediction

Marc Oliu, Javier Selva, Sergio Escalera; The European Conference on Computer Vision (ECCV), 2018, pp. 716-731


This work introduces double-mapping Gated Recurrent Units (dGRU), an extension of standard GRUs where the input is considered as a recurrent state. An extra set of logic gates is added to update the input given the output. Stacking multiple such layers results in a recurrent auto-encoder: the operators updating the outputs comprise the encoder, while the ones updating the inputs form the decoder. Since the states are shared between corresponding encoder and decoder layers, the representation is stratified during learning: some information is not passed to the next layers. We test our model on future video prediction. Main challenges for this task include high variability in videos, temporal propagation of errors, and non-specificity of future frames. We show how only the encoder or decoder needs to be applied for encoding or prediction. This reduces the computational cost and avoids re-encoding predictions when generating multiple frames, mitigating error propagation. Furthermore, it is possible to remove layers from a trained model, giving an insight to the role of each layer. Our approach improves state of the art results on MMNIST and UCF101, being competitive on KTH with 2 and 3 times less memory usage and computational cost than the best scored approach.

Related Material

author = {Oliu, Marc and Selva, Javier and Escalera, Sergio},
title = {Folded Recurrent Neural Networks for Future Video Prediction},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}