Two ways of extracting points of interest from Sentinel-2A data

Published in

Analytics Vidhya

9 min readApr 14, 2020

Lately I’ve read about a competition organized by European Space Agency where they ask to try and find insights about societal, urban and natural changes from the current pandemic using their freely available imaging data from Sentinel satellites. Not that I’m a remote geosensing expert, but thought it’s a good idea to educate myself a bit on the topic.

First things first — what is Sentinel-2 imaging data anyway?

Sentinels are a constellation of satellites lauched a couple of years ago by ESA. Every five days they make pictures of all landmasses and coastal areas. Pictures are made in different wavelenghts — there is Sentinel-1, which has a microwave radar that penetrates through clouds and essentially allows to build a 3D-map of an area. There is Sentinel-2, which takes pictures in 13 wavelengths ranging from just beyond blue (443nm) to short-wave infrared (2190nm). There are other satellites too for different purposes.

The best thing is that the images from these satellites are publicly available for free — for everyone to do their research or even build a commercial product. A whole ecosystem has already grown around this data.

If you have free time, it’s definitely worth to check them out. So that is what I did.

Here we are dealing with geographical data — these images are essentially maps — and spectral data — different objects reflect light differently at different wavelengths. So if you do your research, you may want to drill down to particular geographic objects or objects that have a particular spectra — made of a particular material.

In this post I will share two ways to do just that. And if you already know how to download and preprocess Sentinel-2 images, you can skip the following two sections.

Get the images

Let’s download the images first. There are two ways to do that — through an interactive web interface on Copernicus Hub or through API available with sentinelsat Python package. API is definitely more convenient, but you should go at least once to the web interface to understand what exactly you need to download.

The reason is that Sentinel-2 takes pictures of 100*100 areas which slightly overlap. As I wanted to get the images of my home city Tallinn, it turned out that it is located exactly at the area of such an overlap. Every image is ~1 GB of data, so I wouldn’t want to download unnecessary duplicate images.

Now knowing the the area which interests me has the code “T35VLF”, I can prefilter what goes out of API.

You need to install sentinelsat and geopandas packages and register an account on Copernicus Hub. Also shapely won’t hurt, as you want to work with geographical polygons.

Now do download a particular snapshot (about 1 GB!) you use dataframe index as a reference. Or use .download_all()

Unzip the downloaded archives and you got your data.

Combine spectral bands into a single image

After unpacking the archives, you will get a bunch of directories with complex names and seemingly obscure directory structure. Inside are images in JPEG2000 format for every frequency band (out of 13). We would like to read and combine them into a single TIFF file convenient for further work.

Speaking of bands — which ones do we need? Depending on the task, you might need 1, 3, all or, as I chose, 7, that represent most of the spectrum quite nicely to do initial exploration.

Images in different wavelengths have different resolutions — the ones we use have pixels with a side of 10 or 20 meters. We would like to extrapolate the images with smaller resolutions to bigger ones.

Next are the pieces of code that deal with directory structure and file processing. First, define parameters which are are working with.

Bivariate spline extrapolation function that makes the transition between the extrapolated pixels smooth.

Let’s now find the directories containing the actual images and combine them into TIFF files, extrapolating where necessary.

What does it do, in short?

The images are located in GRANULE directory, IMG_DATA subdirectory. They are split into R10m, R20m, R60m subdirectories, which denote pixel resolution. Inside are .jp2 files corresponding to each frequency band.
We are loading them using rasterio library.
Each read object contains an attribute profile, which has width, height, coordinate reference system (CRS), data type (dtype) and coordinates of the polygon (transform attribute) in the respective CRS. When doing transformations, such as cropping of an image, we need to update it.
As we are interested only in Tallinn (get_tallinn_polygon()) , we want to crop the frequency band layers already from the start and save only the cropped area to TIFF. mask function from rasterio does just that, but first it needs to convert the Tallinn polygon coordinates from the usual lat/lng format to EPSG:32635 (mem_file.crs) coordinate reference system.
mask function operates only on rasterio file handles, and we don’t want to first write full-sized TIFFs to the disk, then reopen them and only then crop — we want to do it immediately. So we are using MemFile for that.

Each full TIFF would have been over 1 GB, but the cropped version is only ~90 MB.

Extract geographical areas and objects using OpenStreetMap database

What is OpenStreetMap? It is a global map portal, similar to Google maps, but crowdsourced and in wiki-style — everyone can make changes and improve the maps. For example, add new restaurants, building numbers, roads and everything else the proper map should have.

As it’s free and open, you can download the whole world — how does that sound! The world is big though, so there are data providers where you can download only specific countries. Such archive would contain everything that’s in the country (roads, forests, houses etc) with their polygon coordinates and metadata. I used one such to download Estonia — the file is estonia-latest-free.shp.zip. After unpacking, the directory looks like this:

You probably guessed it, using polygons and metadata from those files we are going to crop Sentinel data, the way we extracted Tallinn polygon a little earlier.

I created a special function to do it conveniently, given the directory such as above.

It utilises geopandas module, which has all the pandas capabilities, but also provides specific to work with geodata.

Inputs are:

OSM data location
Name of the .shp file to load (like “roads”, “buildings”, “traffic”)
sub_polygon, which tells whether to subset the whole Estonia to a specific polygon (yes, we want only Tallinn)
Some shapes inside the file may not be a closed polygon, but rather LineString. We try to close them using convex hull method (which calculates the smallest polygon that encapsulates all points), but some are still unclosed when it’s a Point. So remove_non_polygons removes them from the dataframe.
Again, cropping Sentinel maps requires lng/lat format, so swap_coordinates is needed. If you get “windows not intersect” error at any point, first check this and then CRS.

Let’s now try to load a TIFF file, extract parks and forests using OSM database and visualise the result in band with index 5 (B11, 1610 nm, near-infrared).

What we see here is the reflectance of the given areas in near-infrared. The locations correspond well to actual situation which can be seen from Google Maps.

Find objects by their reflectance spectrum

There are plenty of articles and scentific papers with ready-made formulas and tips how to locate swamps, forests, fields and other natural objects by their reflectance in different wavelengths — e.g. by their spectrum. After all, first and foremost Sentinel satellites were created for rural landuse monitoring.

But what if I want to look for something that has a custom spectrum? What if I want to find man-made objects in an urban environment? I would need to find the spectrum of a particular material that interests me and try to separate those areas from the images. How would I do that?

Turns out there is a database of different spectra carefully compiled and curated by U.S. Geological Survey. You can download the 5.1 GB archive that contains the spectra for many minerals, man-made materials and organic matter. They also provide files in a format directly applicable to work with Sentinel-2 bands.

Again, for the downloaded and unpacked archive I wrote a function that loads the material of interest. Let’s load “Sheet_Metal” to find buildings with roofs made of steel.

So we are providing the unpacked library location, name of the material, frequency bands that we have and also normalize.

Why normalize spectral data to 0 mean and unit variance? Because Sentinel image data has one scale, this data has another, but we simply want to compare the shapes of the spectra. We will normalize Sentinel image data soon too. But first, let’s look at the plot.

This is the normalized reflectance of sheet metal in the frequencies that correspond to our bands.

Now let’s work with our previously loaded tiff_data and find those metal roofs.

First, we normalize 3-dimensional tiff_data along axis 0 — basically, taking every pixel and bringing its spectrum to zero mean and unit variance.
Then, we take our target spectrum of sheet metal and calculate mean squared error between it and every pixel — producing a heatmap of errors.
Then we plot the distribution of those errors to get a clue where to cut to take only areas which spectra matches the one of sheet metal.

Let’s take everything that has a MSE ≤ 0.2, display the result and visually compare again to Google Maps.

As the buildings are rather small compared to the whole city, we will take only a subset of the map. The blue color means the smallest MSE, which corresponds the best to the spectrum of sheet metal.

So it seems this method actually works.

Are there any limitations?

Combining these two approaches a lot of interesting things can be made. There are limitations though that you should be aware of to not to spend time on debugging in vain.

The resolution of images is 10*10 meters. It is enough for studying larger scales, such as natural areas, but the maximum you can go do in urban areas is a medium-sized building.
Even though we selected cloud cover no larger than 10%, that applies to the whole 100*100 km picture and your particular area of interest may still be obscured by clouds.
Depending on the season, Sun has varying height over the city and shadows are cast differently. This changes the intensity of reflectance and introduces variance into results.

However, there are studies that seem to have shown the feasibility of sub-pixel analysis. For instance, if you have a parking lot which is “grey”, located in a place with no shadows, then the presence or absence of cars should change its reflectance in predictable manner. This way you could theoretically analyse patterns of urban activity — whether a place is busy or not.