Estimating Vegetated Surfaces with Computer Vision: how we improved our model and scaled up

Bastien Hell
Published in
7 min readFeb 5, 2020


Here at namR, we’re putting our hard work into building a Digital Twin of the French territory. We aggregate, clean and re-organize a large quantity of data of different formats coming from many different providers. Among them are aerial images (geo-referenced photographs taken by an airplane), which play a crucial role and represent one of the richest sources of information of the area they describe.

Today, we want to take you through the study of a very meaningful case for us, the detection of vegetated surfaces around buildings. In this project, we will show you how we build a dataset suited to our application and what we went through. The detection of vegetation is a simple yet powerful tool to help understand the quality of life around the populated areas.

We decided to focus on the vegetated zones’ segmentation, which means we predict a class between “vegetation” or “not vegetation” for each pixel.

Simple ideas don’t always lead to good results

Our first idea for detecting vegetated areas consisted on a very simple thresholding of the HSV value:

HSV wheel and hues selected for vegetation detection

We could consider as vegetation the pixels with a hue between 40 and 160 for example (values we chose empirically). This encompasses the yellowish greens as well as more blue-green hues. We would need to take into account saturation and value (luminance) as well in order to consider the scene luminosity and the image’s saturation. This was our first model, which consisted basically a filter on 3 dimensions. In order to smooth out the pixel classification we run standard morphological operators, closing and opening, to help regularize the detections spatially.

Test image, filtered hue, smoothed out mask, image and detection overlap

This is a very cheap task that could make it easy to process the whole territory. However, this cheapness also makes it very unreliable. Indeed, shadows were poorly detected and the pixels classified as vegetation using such an algorithm could as well be part of a green colored roof, green shades of water or even simple chromatic aberrations in our data!

Vegetation detection through the hue value doesn’t always work

Leveraging data creatively to make the model development easier

Working with aerial and satellite imagery, we know that there are many different modalities for imaging the Earth [1], and one of them is multispectral imagery with infrared information. Multispectral images carry more information than simple RGB images: infrared bands are useful because they help us compute a value known as the NDVI (Normalized Difference Vegetation Index) [2]. The NDVI is a simple yet effective way to detect vegetation on multispectral images:

Formula for computing the NDVI, with NIR and R being the near infrared and red spectral reflectance respectively

This value is maximal when there’s a low spectral reflectance of the incoming radiation in the red band and a high near infrared spectral reflectance. It’s a good indicator of the presence of vegetation. NIR and R going between 0 and 1 (ratios of the incoming radiation), the NDVI takes values between -1 and 1.

Profiles of reflectance for different spectral ranges for vegetation and non-vegetation

For green, lush vegetation, R is low and NIR is high (vegetation has low red tones while it reflects infrareds pretty well), so the NDVI index will be very close to 1. For dry or naturally red-ish vegetation, R is higher but we hope that NIR is high enough to keep a high NDVI value.

Once again we were presented a challenge: we had no access to high-resolution multispectral imagery on the whole territory. But since we have access to a limited scope of multispectral data, we came up with a more creative idea: we could leverage this limited data in order to automatically label the aerial imagery we have access to, thus creating a training set for a deep learning algorithm at a very low cost.

Training and inference pipeline
Image, NDVI mask and overlay

Working regularly with U-Net architectures [3], we decided to continue with this family of models for this task. What motivated our decision the most was:

  • U-Net models are fast for training and inference. Moreover, the FastAI framework already has a good implementation that synergizes well with its one cycle fitting function
  • our task doesn’t seem too complicated. Vegetation is quite recognizable and we hope a simple deep learning model can be successful enough for this task

We will move forward quickly on the training details but using the fast convergence [4] and cyclical training [5] policies let us build a model within a short delay.

We obtained an overall accuracy of 85% and dice score of 73% [6].

While these scores seem low for the kind of algorithm we’re developing, they’re actually showing us the dataset has some issues. First, the training data isn’t perfect and there are still some pixels with the wrong label. After all, it’s a semi-automatically generated dataset that we didn’t control as well as a hand-labeled one and some mislabeling was to be expected. Some vegetation pixels weren’t properly tagged because their NDVI was too low, because they represented dry grass for example but also because the aerial and satellite images weren’t all acquired at the same date. Second and most importantly, the evaluation data is in itself flawed. Being a subset of the dataset we created, mislabeled pixels are present in this set too and the evaluation is tricky. Nevertheless, the results are very promising for a model we developed this fast with data we generated automatically!

Other things we learned

Another big part of our project was to detect vegetated surfaces for very different types of parcels, with variable sizes and shapes. This meant a prediction on a crop the size of our parcel would represent very large -or small- images that would have to be resized into a different-sized square image. Because our images have a fixed resolution we want to avoid resizing and reshaping them as much as possible because we know a pixel will always represent the same surface on the ground (0.04 m²). Thus, we decided to transform the zones we want to predict on into geospatial tiles. We used the Slippy Map format [7] in order to split the territory into fixed-sized squares.

After running predictions on all tiles we fuse the detected polygons into conjoined vegetated surfaces. We’re really proud of the results. The vegetation is segmented precisely despite the smoothing of the detections.

Example of our final vegetation detection on a set of parcels

We then integrated those detection in our Digital Twin, in order to associate to each French parcel its vegetated surface. For example, it’s an important indicator of a school’s greenness, which we analyse in our tRees project along many properties (like roof surfaces) related to the ecological transition of French schools. We’re very proud of the results!


When we started this project, we were thinking the objectives would be quite clear. We would develop a model that takes any image of a whole parcel in input and delivers a set of polygons encompassing all vegetation on this parcel.

In order to solve this task, we had to acknowledge and answer many different questions, from data sourcing to deployment. We ended up building a model with very reproducible results, deployable virtually everywhere we have access to aerial images. This model was built upon different data sources, RGB aerial and multispectral satellite imagery. Mixing up data types creatively is easier than it seems, give it a try!

We also realized that scalability on France’s 643,801 km² and stability of the predictions meant dividing space into regular tiles which helped us keep constant object sizes while ensuring we were running the predictions only once on dense territories.

The detection of parcels vegetation was a very complete subject for us and we learned a lot solving it. Here at nam.R, we’re glad to challenge ourselves with projects like these in order to understand our territory.

List of references

[1] 15 Free Satellite Imagery Data Sources:

[2] Monitoring corn and soybean crop development with hand-held radiometer spectral data:

[3] U-Net: Convolutional Networks for Biomedical Image Segmentation:

[4] Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates:

[5] Cyclical Learning Rates for Training Neural Networks:

[6] F-scores, Dice, and Jaccard set similarity:

[7] Slippy map tilenames:



Bastien Hell
Writer for

Computer Vision Scientist @namR