AtlantaNet: Inferring the 3D Indoor Layout from a Single 360(∘) Image beyond the Manhattan World Assumption
We introduce a novel end-to-end approach to predict a 3D room layout from a single panoramic image. Compared to recent state-of-the-art works, our method is not limited to Manhattan World environments, and can reconstruct rooms bounded by vertical walls that do not form right angles or are curved -- i.e., Atlanta World models. In our approach, we project the original gravity-aligned panoramic image on two horizontal planes, one above and one below the camera. This representation encodes all the information needed to recover the Atlanta World 3D bounding surfaces of the room in the form of a 2D room footprint on the floor plan and a room height. To predict the 3D layout, we propose an encoder-decoder neural network architecture, leveraging Recurrent Neural Networks (RNNs) to capture long-range geometric patterns, and exploiting a customized training strategy based on domain-specific knowledge. The experimental results demonstrate that our method outperforms state-of-the-art solutions in prediction accuracy, in particular in cases of complex wall layouts or curved wall footprints.