Can we locate dams from space?
Millions of dams around the world have tremendous impacts on people and the environment. While trying to understand and manage these impacts, water managers and researchers struggle to find dams datasets. Now that we have daily satellite images from everywhere on the planet, how hard would it be to map the world’s dams?
The need for datasets of georeferenced dams
Millions of dams worldwide have been constructed to manage water resources for human benefit. They also have immense environmental impacts: on freshwater biodiversity (e.g dams modify habitats, block fish migration…), hydrological ecosystem services and human health. Despite this, only the largest dams have been mapped, but they are far outnumbered by smaller dams that have tremendous impacts.
Efforts by researchers, NGOs and governments to understand and mitigate the impacts of dams and manage hydrological ecosystems smartly are restricted by the available datasets of georeferenced dams.
Indeed, Lisa Mandle, Senior Scientist at the Natural Capital Project, Stanford University studies how Myanmar’s forests contribute to the country’s water supplies by keeping dams free from sediments. “We needed to know where these dams were located. This data turned out to be hard to get. It was spread across different government offices, and not publicly available in any comprehensive, digital form”, she reports. “The best information we could find included only 15 dams in Myanmar. By only considering the largest handful of dams, we know our analysis underestimates the importance of nature.”
Halfway around the world, Marcia Macedo, Assistant Scientist at the Woods Hole Research Center, ran into a similar challenge while studying the impacts of small agricultural dams in the Brazilian Amazon. “Our field studies clearly show that small reservoirs have had a big impact on freshwater ecosystems,” she explains. “A single reservoir can increase stream water temperatures by up to 4°C. It also fundamentally changes aquatic habitats, nutrient cycles, and food webs.” Macedo has already mapped over 10,000 small dams in Brazil. “We know there are millions more of these dams out there, and that they have a massive cumulative impact. But we can’t manage them if we can’t map them — inexpensively, at high resolution, and over very large areas” she adds.
The solution: satellite imagery + deep learning magic
With satellite imagery increasingly accessible (thanks to open data from policies, e.g from Nasa or ESA and tools such as Google Earth Engine), plus the magic of machine learning, it sounds like you can do anything! But seriously, deep learning algorithms distinguish cats from dogs, detect obstacles in front of cars and sort cucumbers among many other applications of image classifications. So it seems reasonable to ask:
Can we detect dams from satellite imagery?
In the span of a 4-hours hackathon at GeoForGood 2018, we approached this question — and prototyped a pipeline using Google Earth Engine to create the training data and Tensorflow for the deep learning dam classification model.
Sounds like yes, we can locate dams from satellite imagery. So now, the question is:
What would it take to locate every dam on the planet with satellite imagery?
That all sounds wonderful but really, this crazy 94% accuracy mostly demonstrated the pipeline worked. Doesn’t represent a reliable result. So let’s dive in, and discuss what we’ve done, some challenges, and how we could do it better.
1. Training data
For dams images, we used the GRanD dataset to export raster images of areas surrounding dams outlets, as TFRecords. These include 5 bands: RGB and NDWI from Sentinel 2 (15m), as well as elevation (DEM data: ALOS DSM 30m).
For non-dams images, we prioritized areas at the edge of water bodies that are easily confused with dam reservoirs. Otherwise the algorithm would classify between edge-of-water and everything else, instead of locating the dams outlet (hence the 94.4%!). Using the JRC Global Surface Water dataset in GEE, we randomly sampled points on the edge of water bodies along with non-water points (so the algorithm wouldn’t get confused by, for example, the straight line of a road…) and followed the same procedure to export rasters of 300m*300m as TFRecords.
2. Deep learning classification
You were wondering what TFRecords were, now it’s all going to start to make sense: a TFRecord is a (binary storage) data format optimized for use with Tensorflow, an open-source library particularly handy for deep learning applications such as … image classification!
We were very lucky to come across a notebook developed by Chris Brown which trained a fully convolutional neural network (FCNN) to detect cars in images of parking lots. A FCNN allows to make predictions on images of any dimensions, so we can train the model on our 300m*300m images, and make predicitions later on images of any dimensions (yay!)
The latest iteration was trained on a fusion of about a thousand training data points and resulted in an accuracy of 92% on a 600-points test set, with 50,000 steps.
The results are very promising, yet the approach still needs to be tuned. Nathan Pavlovic, who played the Machine Learning expert with virtue for this Hackathon, raises his eyes from the computer, seemingly satisfied. But he still notes a couple issues, for example that the algorithm still seems to sometimes confuse dense forest with water, as seen in the lower center of the image. None of the issues are insurmountable. In fact, this one is probably due to the fact that dense forest was an under-represented class in the non-dam training set we quickly put together. With more time, a number of improvements could be made on the training data to increase performance.
So it does seem very possible to locate every dam on the planet, with satellite imagery!
What about commercial solutions for object recognition from satellite images? They’re not helpful on this question, so far (and we love open-source anyway). Descartes Lab has an impressive geovisual search tool, however limited (to a fixed search area and a single global satellite dataset), its featured applications seem quite promising. But it does a pretty terrible job at locating dams (likely because it’s just using RGB bands, where elevation and NDWI are key here). Let’s see what Planet Queryable Earth will offer in the future, with its mission to “index and make accessible what’s on the Earth, just as Google indexed and made accessible what’s on the internet”… For the moment, these algorithms remain proprietary anyway.
A couple challenges identified so far include the extreme variability in dam size and reservoir water storage. To address them, inputs data temporal scale and spatial resolution would need to be adapted, e.g we may need images from several points in time, from different seasons or years, to capture dams that dry out seasonally or during droughts. And what about the very small dams hidden in a forest?
If we were to do it better, the training data could be augmented and improved, finely tuned to avoid the artefacts we identified: more training points, but also higher resolution imagery, especially for smaller dams - e.g Planetscope captures 3m images, however these aren’t freely available at scale nor easily integrated (yet?) in Google Earth Engine.