Global Earth Observation Service from your Laptop
Is it possible to develop a global EO solution that can fit into your laptop?
Everyone of us is getting more and more exposed to satellite imagery either through documentaries on the TV, news in official media, or social media platforms. The main reason behind all of it is the open data policy adopted first by NASA and later on by EU’s Copernicus program, which made satellite imagery accessible to nearly everyone. Being able to see our planet from a distance changes our views on the Earth and how it is changing. We can observe the seasonal changes, devastation caused by natural disasters, or consequences of climate change.
Last week, during the World Water Day on 22 March, we were all reminded of the importance of water in our lives — the water is not only essential to quench thirst or protect health, but it’s also vital for creating jobs and supporting economic, social, and human development. Today over 600 million people live without a safe water supply. Environmental damage, together with climate change, is driving the water-related crises we see around the world. Cape Town, a city with population of almost 4 million, may become the first major city to run out of water as a result of a several-years-long drought. Water to the city is supplied largely from dams in mountainous areas close to the city. The visualization below shows the extent of water in dams near Cape Town from the end of 2015 when the reservoirs were almost full and from a few weeks back when Theewaterskloof Dam was already almost empty.
Such simple yet strong visualizations have a profound impact on each and every one of us. They serve as a visual aid that help to convey a message or present severity of a problem in a different way. Yet, if you’re more data-centric, prefer quantitative assessments over qualitative, like I do, then you can’t help but wonder:
Can water levels in reservoirs be monitored from space?
The answer to the above question is of course: yes they can. Some have been doing it for years. The real question is thus:
How can water levels in reservoirs be monitored from space?
Once we know the answer to the above question the next one naturally follows:
Is it possible to monitor water levels of all reservoirs on Earth using very limited resources?
The last two questions triggered me to start a side project Monitoring Water Levels From Space. As a novice in the fields of Earth observation and remote sensing I still have a lot to learn and undertaking such a project makes the learning process more enjoyable. Every side project comes however with a few constraints:
- limited amount of time: aiming for good results obtained quickly. Aiming for excellent or perfect results may prolong the duration of the project and will increase the risk of never finishing it.
- limited resources: the solution should run on my laptop and should impose none or very little costs. I can’t imagine having a steep learning curve if I can’t approach the problem at hand interactively.
The rest of the story guides you through my solution that fulfills the limited time and resources constraints as well. It’s based on the Python ecosystem for performing data science. The code is available in my GitHub repository.
Monitoring Water Levels From Space
From an idea to a prototype
The goal of the Water Level Monitor project is to estimate the water levels of dams, lakes, or any other manmade or natural large water reservoirs using open and freely accessible satellite imagery. In the prototype stage, we’ll limit ourselves only to imagery from Sentinel-2 satellites operated by ESA. Before we even think about a water level monitor that would run globally we have to find out how to determine water level of a single reservoir. Although it might be possible in specific reservoirs to estimate the amount of water in them, this is not generally the case. We’ll therefore estimate the water level by measuring the reservoir’s current water surface and compare it with the water surface when full:
Water level = (Current Water Surface Area)/(Water Surface Area when Full).
We’ll use Theewaterskloof Dam for prototyping. As seen from the time-lapse below the water level in this dam has changed from almost full to almost empty in the last two years.
Step 1: From satellite image to reservoir’s water level
Water bodies can be identified in satellite imagery in many different ways. From very elaborate and complex methods utilizing recent advances in computer vision, such as deep convolutional neural network, to very simple methods invented by the domain experts long ago — thresholding a grayscale Normalized Difference Water Index (NDWI) image. During prototyping we of course opt for the latter. The optical satellites, such as Sentinel-2 or Landsat 8, have sensors to detect reflected light from Earth’s surface not only in the visible (red, green, and blue bands), but also in other parts of the of the EM spectrum: near infrared (NIR), short wave infrared (SWIR) and others. Water bodies reflect very little or no light in the infrared part of the spectrum, which is the opposite to vegetation or soil. The NDWI, defined as (GREEN-NIR)/(GREEN+NIR), will therefore have higher values for water surfaces and lower for others. The figure below shows the NDWI image of Theewaterskloof dam on the left and the derived binary water mask on the right obtained with Otsu’s method.
From the binary water mask we can extract the current water surface area of the Theewaterskloof dam that we need for the water level estimation. The only problem is that we can’t simply sum surface areas of all water bodies seen in the binary water mask. The smaller water bodies seen in the lower left part of the image are not part of the dam. They represent smaller ponds for irrigating the neighboring fields and have to be excluded from our calculation. This would be a very easy task, if only we could somehow know which pixels are within the dam borders (water extent of the dam when full). Well, we could do it by hand — draw a polygon surrounding the dam based on true color image. But if we go along this path, we can say goodbye to a global Water Monitor. There must be another way. And there is! Instead of relying on mapping the dams’ borders by ourselves we take results of over one million contributors mapping water bodies, forests, roads, streets, cafés, and much more that ends up in the most widely used map in the world — the OpenStreetMap. The OpenStreetMap provides Overpass API that allows us to query the OSM data based on name, location, type, and other properties. Once we get the hang of it, we are finally in the position to combine everything together in a single image:
From here to water level of Theewaterskloof dam on February 25, 2018, there’s only one line of code: (surface of the blue area within the red)/(surface of the red area), which yields 23.5%.
Step 2: Historic water levels
Being able to estimate the water levels of a reservoir from a single NDWI image means that we can easily run our algorithm on all past images available in the archive. The ESA’s Sentinel-2 satellites have taken a shot of Theewaterskloof dam 104 times between August 10, 2015, and March 22, 2018. The estimated water levels within this period are shown below.
In the beginning of the project we set our bar to good and not perfect. But in all honesty, however we may try to spin it, the above results are neither — to put it mildly. The inserted true color images, taken only a few days apart, show that the presence of clouds in the image leads to underestimation of water levels. This is understandable since the NDWI value of clouds is small and our simple water detector identifies them as it should — as not water. Clearly we have to mask out the acquisitions where clouds obscure the view of the reservoir. Luckily there exists a tool that searches for presence of clouds in Sentinel-2 imagery and we can easily incorporate it into our workflow. Finally, we get much more realistic evolution of the Theewaterskloof dam’s water levels, as shown below.
The ability to observe the evolution of a dam’s water level through time instead of looking at a single number — the current water level — can help us better understand the current situation. The dams near Cape Town are namely recharged by rainfall during the winter months of May to August, and dam levels decline during the dry summer months of December to February during which urban and agricultural water use increases. From the above plot we can see how the Theewaterskloof dam was almost full at the end of the wet season in August 2015, then its water level dropped during the dry season, increased again during the wet season of 2016, but not enough. This is even more true for the next year — 2017. This study of annual rainfall also shows that the last two to four years are the driest since the mid-1920s in the province of Western Cape. The most recent news from Cape Town report that the Day Zero may have been deferred due to water saving measures and water supply augmentation. Hopefully, this year will bring at least the normal amounts of rainfall in the wet season that is about to start so that the water crisis doesn't repeat next year.
Step 3: Going global
OK, before going crazy and running the Water Level monitor globally, it should make sense to test it on some intermediate scale. How about all water bodies in South Africa? Sounds reasonable. We have the algorithm in place that extracts the water level of a single reservoir. We now have to create a list of all water reservoirs in South Africa, determine the bounding box that surrounds each and every one, and make a query to OSM to get the reservoir’s extent when full with water. Uff, I don’t know about you, but I’ve lost motivation at:
create a list of all water reservoirs.
Again, if we aim for a global service, then we can’t rely on a manual input. All information has to be extracted from existing sources in automatic fashion. As before, the OSM comes to our rescue. But this time not in the form of the Overpass API that we have used before. Probably it’s my problem, but I find it difficult to extract polygons of all water bodies within certain region, excluding rivers, etc., using the Overpass API. Luckily, there exist companies like Geofabrik that provide excerpts and derived data from the OSM data set. For us the most important excerpt is the one containing all water areas (in form of polygons in a so called shapefile that can be easily manipulated with geopandas Python package). After some additional clean-up (removing polygons of parts of the rivers, too small water bodies, …) we end up with 349 water bodies — lakes, dams, reservoirs, …. — in South Africa. We can now run our water detector over all reservoirs in our list and collect the results. And in the mean time, while we wait for the results, we just have to figure out how to visualize the results. After trying several different options I found Mapbox GL to be the easiest to use. Without further ado here are the current water levels (as of beginning of March) of all water bodies in South Africa.
Before I conclude I should mention the technology stack and data sources that made this project possible:
- I use Python 3.6 with all the usual data science and computer vision related packages (NumPy, SciPy, scikit-image, matplotlib, geopandas, and shapely in this particular project).
- All the development and experimentation is done in Jupyter Lab.
- All Sentinel-2 imagery is retrieved using Sentinel Hub (python package). Being able to download and process data only for the region of interest makes the execution of this project on a laptop possible.
- Cloud detection and masking is performed on a region of interest using Sentinel Hub’s cloud detector (python package). This part is actually the most CPU intensive and also requires significant amount of data to be downloaded (significant in terms of this project). If these masks were pre-computed… Well then you could run the global Water Monitor on your mobile phone.
- Identification of regions of interest and the water extent of full reservoirs depend on OpenStreetMap.
- The above is done almost effortlessly using Geofabrik’s excerpts of OSM data.
- Last but not least, a picture is worth a thousand words. The same can be said for Mapbox’ visualizations (using mabpoxgl-jupyter version).
Now it’s your turn
This blog post is here to show you that not all solutions or applications that use satellite imagery require huge computer resources. There are so many use cases that can be prototyped or even executed in production using resources similar to those of your laptop. Of course, being familiar with the most appropriate tools and services helps. Hopefully, this blog will put you on the right path and motivate you to start experimenting with satellite imagery. Especially, if you’re already familiar with Python ecosystem for performing data analysis and/or computer vision, machine learning algorithms, or whatever extracts valuable information from imagery.
You’re welcome to leave a comment or question regarding this project below. Even better, add a link to blog post, GitHub repository, or anything that showcases your projects, solutions, etc… You’ll definitely have one more fan 🤓.