Deep Learning for Roof Detection in Aerial Images in 3 minutes

Alexander Usoltsev
namR
Published in
3 min readApr 9, 2019

At nam.R we are working hard to build a Digital Twin of the french territory, we aggregate, clean and reorganise a large quantity of data of different formats coming from many different providers. Among them aerial images (photographs taken by an airplane), play a crucial role and represent one of the richest sources of information of the area they describe. Extracting this information though is often a challenging (and fun !) task.

Let us show you how we did it.

Using deep learning we were able to infer the number of slopes of roofs that appear in the image, as well as the coordinates of their geometry. In Computer Vision jargon this task is called “object segmentation” and refers to the detection of the pixels that represent the desired object. We did this through a current state-of-the-art segmentation architecture for our neural network, Mask R-CNN, a model published by researchers from Facebook in the Keras framework.

Building a Training Set

Training a deep learning model to detect any king of object is a “supervised learning” task, so we need not only the images of the roofs we want to extract the information from, but also a (large) number of labels. Labels in this particular case should tell us which part of the images contain the roof slopes. Unfortunately a reliable personalised label shop doesn’t exist, so we had to make them ourselves. We did this by projecting the roof coordinates, found in CityGML 3D reconstructions of several cities, on the 2D plane of the images. So we have used two types of data to train the machine learning model: images of roofs and labels for roof slope coordinates. An example of a training instance is presented in the figure below.

Example of a training sample.

As you probably know deep learning algorithms are very greedy, and need a lot of data, the more the better. To increase the amount of training labels we could have labelled more roofs by hand, or “augmenting” our data by generating new samples with some simple transformations of original images and labels (rotation, vertical or horizontal flip etc etc) . Given the labelling by hand is a very slow and often inefficient process, we opted for the former rather than the latter.

In the last few years, many high-performance deep neural networks were developed and achieved impressive results on tasks of object detection. As said we chose Mask RCNN, a high-performance object segmentation network that was released in 2017. We adapted Matterport’s implementation to be compatible with our aerial images and labels data source.

During the training, the model takes images and their corresponding labels and learns its internal parameters to detect roof slopes on any new image. Given the relative small size of the images lasted just over 2 hours. Below the loss function values for our model reassured us that the training was proceeding smoothly. The loss function value goes down and our model precision up !

The Predictions

During the prediction phase, the model takes as input new aerial images and gives as outputs the contours of the roof slopes in the image. In the image below are some examples of aerial images, where the red lines indicate the detected roof slopes on it.

As we can see the predictions are quite accurate, but the roof slope detection works not as well for the less common material — metal. Perhaps we need more labels of metal roofs, or we should try to generate some ourselves. But we leave this to future work.

Finally knowledge of the number, the coordinates and the orientation of the slopes that form a roof, helps us understand the solar energy potential of the roof, and ultimately lead to a progress in the nam.R’s ultimate goal of facilitating the ecological transition of our cities.

--

--

Alexander Usoltsev
namR
Writer for

Computer Vision Engineer keen about Image Processing, Deep Learning and Cloud Computing.