AI For Good — Disaster Response

Published in

GeoAI

19 min readJan 27, 2020

With all the discussions about the dangers and ethics around emerging artificial intelligence technologies, we sometimes forget all the good that AI is doing in the world. For this article, I’ll outline some of the work we are doing at Esri using AI for improving disaster response.

We will ultimately be training and utilizing 3 separate deep learning models, using parallel processing to do inference against of aerial imagery, applying some advanced routing algorithms through the ArcGIS ecosystem, and finally utilizing web apps to share and keep track of response progress. This will result in a system that can consume post-disaster aerial imagery, detect damaged structures, detect obstructed roads, and construct a response route in a timely manner. We will be able to not only generate the needed intelligence for a quick and accurate response, but also deliver it directly to the individuals who need it the most: first responders.

Strap in, this is cutting edge tech coming together to save real lives.

Hurricane Michael
The Challenges of Coordinating a Response
The Fuss About Aerial Imagery
MaskRCNN: Building Footprints
Transfer Learning: Damage Classification
Unet Segmentation: Road Debris
Crowd-sourcing AI
Optimizing Routing
Where the Rubber Meets the Road
Cutting edge: 3D Reconstruction

Hurricane Michael

Hurricane Michael was one of the most devastating hurricanes in recent memory. On the afternoon of Wednesday October 10, 2018, Hurricane Michael made landfall near Tyndall Air Force Base in Mexico Beach, Florida.
With maximum sustained winds of 155 mph (250 km/h), the category 4 storm nearly leveled the entire base in minutes.

The storm continued on, causing major damage all over Mexico Beach. With a storm surge of 14 ft, dozens of homes were completely washed away along with large sections of U.S Route 98. Many structures were reduced to bare foundation slabs. Trees were mostly snapped and flattened with the few remaining ones having their bark completely stripped from the heavy winds. The path of carnage extended all the way inland to Jackson County, Marianna — where storefronts in downtown were completely torn off the foundation and 3 people were killed

Jackson County, Marianna — Jeff Burle (Tallahasse Democrat)

The eye of the hurricane rapidly moved towards Panama City, where massive structural damage was reported including an entire freight train being derailed and multiple television and radio stations being knocked out. When the storm finally left Florida, over $5 billion in property damage was left behind, and 47 people were killed.

When the storm moved into Georgia, it caused massive damage to farms and agricultural land, amounting to over $2 billion in agricultural loss. Many farms were completely washed away at a time when farmers were still trying to recover from the damage caused by Hurricane Irma. An 11-year-old girl was killed when a tree fell on her home.

The storm continued its rampage through North Carolina, Virginia, and Maryland. Hundreds of roads were either washed away or heavily blocked by debris causing dozens of fatalities from road accidents. 4 people (including a firefighter) were washed away in floodwaters in Virginia.

After the storm finally dissipated on October 16th, 2018 (9 days after forming), there was a total of over $25 billion USD in damages and 72 total fatalities (57 in the United States and 15 in Central America).

The Challenges of Coordinating a Response

Responding to a major disaster such as hurricane Michael involves multiple state and federal agencies, multiple first responder and emergency departments, and even commercial entities such as utility companies that must be organized into a cohesive unit in an extremely time-sensitive environment. This is obviously no joke, and a task that is made much harder with the unique challenges of each disaster (fire, earthquake, hurricane, floods etc.) and the time consuming task of extracting and organizing priorities.

The first part of integrating AI into such a complex and sensitive workflow was to identify areas where repetitive human judgments were being made (preferably at scale). This is a good proxy for identifying where AI might be useful in general: find where either intuition is being applied at scale or where there is a need for new insights/discoveries from data. We found this quickly among the hugely important task of analysis of post-disaster aerial imagery.

This was a natural use case not only because of the ubiquitous use of aerial imagery post-disaster, but also because of how well the Esri ecosystem is set up to do the sometimes arduous task of pre-processing, viewing, and preparing imagery data.

The Fuss About Aerial Imagery

Good aerial imagery is a beautiful thing. From high resolution, multi-band, hyperspectral — aerial imagery can capture everything from changes in land cover to the changes in the polar ice caps. Unfortunately, good aerial imagery can be expensive, hard to come by publicly, and hard to work with without the proper tools. The ArcGIS Pro software provided by Esri allows us to seamlessly ingest, label, and prepare our imagery data for eventual training of complex AI models. We can even use the related ecosystem to publish our results in a way that first responders can immediately access to help organize a response (many of these organizations already have Esri software set up as well).

For this project, we utilized imagery captured by NOAA following Hurricane Michael.

Imagery collected after Hurricane Michael (NOAA)

The first two AI models we will go over will accomplish one of the most important and time-sensitive tasks that image analysts engage in post-disaster: identifying damaged structures where people may be trapped.

Mask RCNN: Building Footprints

The very basic idea of how deep learning models for computer vision work is pretty simple. Consider the problem of finding a white square on a black background. While this is (hopefully) a tremendously easy task for humans, teaching this to a computer might not be immediately obvious. The key idea is to convert the image into a matrix of pixel values, and feeding this into a computer (along with labels for the object of interest), so the computer can learn how groups of pixel values indicate the presence of a certain object.

Converting a single band image (left) to a matrix of pixel values (right)

Once the computer has recognized how a cluster of pixel values (a single 1 in the above example) indicates an object, it can generalize this information to new images by converting them to the same kind of pixel value array and looking for similar clusters of pixel values.

It is not too difficult to see how this can be generalized to predict bounding boxes around objects — we can describe the relevant “cluster” of pixels in our training data (corresponding to the object of interest) with the coordinates of the corners a box which encircles the cluster. For example, consider the popular YOLO architecture, pictured below:

Credit: https://towardsdatascience.com/yolo-v3-object-detection-53fb7d3bfe6b

At a broad level, this model is learning to describe objects of interest (like a dog in the above example) with a vector consisting of the coordinates of the bounding box (xmin, xmax, ymin, ymax) along with an object label (such as “dog”, “cat”, “car” etc.) and associated confidence for that object (a score between 0 and 1).

Internally, the model is searching across multiple scales of image representations and predicting these attributes for grid cells of varying sizes. However, it is not immediately clear how to do this for more complex objects such as building footprints — which are inherently polygons and can be described by anywhere from 3 to many dozens of points, depending on how complex the roof geometry is. Enter the Mask RCNN model:

The first few components of Mask RCNN, utilizing a feature extractor and convolutional filters to make an initial bounding box around a detected object

At it’s base, the Mask RCNN architecture is simply a Faster RCNN model, an object detection model similar to the YOLO architecture described above. However, the important distinction is at the end (or “head”) of the model, where we predict not a bounding box but instead a segmentation mask for our object of interest. We can think of a segmentation mask as the exact cutout of our object along it’s outline, instead of simply a box around it. For example:

Bounding boxes vs. segmentation masks for vehicles (Mask RCNN)

Observe how the segmentation masks encircle the exact shape of the vehicles in the image vs. the bounding boxes which simply indicate the presence and rough size of a vehicle.

Before we can understand how to teach a model to predict such segmentation masks, we must first create a metric that we can use to guide a model towards a more “correct” segmentation mask. The key is in representation of the mask as a binary image (pixels are only 0 and 1)-areas that should contain an instance mask have a pixel value of 1 whereas areas without have a pixel value of 0.

In the Mask RCNN framework, the class label and bounding box prediction is done separately (“decoupled”) from the mask prediction. Hence, the model can predict a bounding box and label as above, then predict where in that bounding box the segmentation mask lies by denoting certain pixels as 1 or 0. Now, utilizing simple binary cross-entropy on the pixels, we can compare the ground truth segmentation mask inside that bounding box to our predicted one and understand the quality of our prediction (and optimize it).

A binary mask of a ballon (Credit: Matterport Engineering)

After training our model, we also need a way to understand the overall performance. Given that we’re predicting a segmentation mask and we have a ground truth segmentation mask, we can define how much overlap there is between both. Of course, we also need to take into account how big the actual mask is as well. This leads us to the IoU (Intersection over Union) metric:

Observe this metric is 1 when the area of overlap and union are the same — this happens when the predicted and ground truth segmentation masks are the same size and perfectly overlay with one another. The metric is 0 when there is no overlap between the ground truth and predicted masks, and the metric grows towards 1 as we gain more overlap and the mask size starts to become consistent with the ground truth (this is the reason we need to divide by the union, to prevent obtuse cases like labeling the entire image as the mask to get 100% overlap).

IoU values roughly greater then 0.65 correspond to “good” predictions, whereas values at 0.75 and above are almost human quality annotations. Below is a visual comparison using bounding boxes:

Prediction (red) vs Ground Truth (red) in terms of IoU ( https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/)

Now, back to disaster response. Recall, we are trying to find the footprints of buildings so we can classify them as damaged or undamaged. We are now equipped with the knowledge of the problem type (instance segmentation) and a good model architecture to handle it (Mask RCNN).

The first actual AI model in our pipeline is trained to look for buildings, and so it takes in an input image and returns the boundaries of a polygon that encircles the building footprint (for each building in the image). This model was trained on over 90,000 building footprints and the proprietary Esri world imagery dataset. Architecturally, the model is a Mask RCNN (the same one from above, implemented with the Matterport library).

Here’s a screenshot of the first AI model detecting some structures after being run on post Hurricane Michael imagery (note it has never seen this imagery before):

Our AI model detecting building footprints

The content of each of the purple footprints will be fed into a secondary AI model, that we will teach to look for damage.

Transfer Learning: Damage Classification

Now that we have an AI to identify buildings present in an image, the second task is judging whether or not a building is damaged. We trained the first model only to detect building footprints because of the wealth of footprint data available and the fact that this allows for decoupling of detection and damage classification (allowing the user to provide their own footprint layer if available). We can now train a second lightweight model to serve as a judge for the buildings extracted by the first (or within existing footprints provided by the user). An additional benefit afforded by this modification is the whole system is modular, and the second model can easily be re-trained to judge other attributes about a building (such as age, architectural style, roof type, and anything else visible in an aerial image) and adapt to different disasters without having to retrain the first model.

Data on damaged structures is not as readily available or easy to acquire as the building footprints. We get around this problem in two ways: by creating our own small training set, and by using transfer learning.

Firstly we create our own training set. So, on a particularly slow weekend afternoon, we load the imagery into ArcGIS Pro and start drawing some labels.

Fortunately, creating a set of labeled examples for imagery using Pro is super simple. We can add a Feature Class to our map, then use the Edit tab to literally draw polygons directly onto the image (if we have a ground truth footprints layer, we can also simply add an attribute for damaged). For this project, we opt to create two separate feature classes, one for damaged structures and one for undamaged structures.

Drawing polygons for damaged (red) and undamaged (blue) structures using ArcGIS Pro for training

When we are done drawing polygons, we can use the Export Training Data for Deep Learning tool to export image chips corresponding to our labeled Feature Class. After the tool finishes, we end up with a folder full of image chips (and associated geo-reference information).

A folder of image chips output from the Export Training Data for Deep Learning tool

In total, we labeled over a 800 structures (about half damaged and half undamaged) and applied some image augmentation techniques (such as rotations, reflections, blurs) to increase our training dataset by several thousand.

Now, onto the transfer learning magic. You may have noticed that the deep learning models we’re utilizing seem much larger then they need to be. We’re also using words like the “head” of a network and learning from different “image representations”. This is because it is usually the downstream layers of a deep neural network that actually perform the task of interest (bounding box prediction, image classification, predicting a segmentation mask etc.) while the earlier layers learn to extract and represent the image as a set of feature maps, essentially array representations of the most salient portions of an image.

These feature maps may capture everything from outlines, contours, color gradients, and even more sophisticated features such as roads or trees that give hints about the context of the image.

There exists networks trained on absolutely giant sets of data (such as Imagenet) that have already learned to extract such salient features, and we can simply utilize these trained networks and change the last few layers for our purpose: damage classification. In this case, we take a VGG-16 model (architecture pictured below) that was trained on the large ImageNet data set and retrain the last few layers to perform classification of structures.

Once this model is trained, we can combine it with the footprint detection model into a single inference function that takes in an image, predicts all the footprint masks, crops each footprint mask (using the function below, accounting for some padding to give the classifier more context), and feeds each cropped footprint into the classifier to predict a label.

The body of the main inference loop is below:

Doing this purely in Python is highly cumbersome and involves some separate methods to handle the geo-referencing and GPU usage. Instead, we can place the model weights into a folder and create an associated model description file — this allows us to run the entire damaged structure detection workflow via the “Detect Objects Using Deep Learning” tool in ArcGIS Pro. This tool will take in the imagery and model description file and produce a feature layer of detected structures with associated labels. It will handle the GPU usage and geo-referencing natively, and has a corresponding server tool to massively scale this workflow.

The Detect Objects using Deep Learning tool running in ArcGIS Pro

The result after Detect Objects using Deep Learning Finishes

Unet Segmentation: Road Debris

Roads are the second part of the disaster response picture. While having an idea of scale and location of damaged structures is important, its value is diminished without an idea of how transportation to and from those areas would be achieved. Obviously this is important not only for first responders, but also for the areas’s residents and even utility and insurance companies after initial search and rescue operations have finished.

The difficult portion of this is figuring out the proper way to even feed this information into a deep learning model to train with. Do we label debris and teach a model to detect it as an instance segmentation problem? Do we create image chips of road and teach a model to classify each chip as blocked/unblocked? Do we feed the entire road network into a model and train it to label each pixel of road as debris/clear?

In the end, we opted for the third approach. Such a task (of classifying each pixel of an image into a class) is called semantic segmentation. It is similar to instance segmentation except we’re not producing a pixel label for every pixel of an image instead of just objects inside the bounding boxes we’ve detected. From the finished model, we expect to get an output image with each pixel colored according to a label of road, debris, or background. The model we utilize to do this is called a U-net (called so because of the shape of the architecture diagram, which cleverly combined a series of upsampling and downsampling convolutions to capture objects at different scales):

At this point we had the road layer itself, the underlying imagery, and a plan for what kind of model for road damage we were going to build. But something was still missing, and after a few coffees we realized we didn’t have any corresponding labels for the roads.

We could do the same thing we did for the structures and manually label road debris (sigh). But we decided instead to utilize some of the great shareable apps and huge variety of GIS post-processing tools available within the Esri ecosystem and our vast network of colleagues who had explicitly or inadvertently volunteered to help out us out instead!

Crowd-sourcing AI

One of the hardest parts of AI is getting a hold of good, labeled data. This still involves large amounts of manual and repetitive human labor, the kind that would drive a single person insane. We can utilize the power of crowdsourcing and GIS to obtain labels for our data in a quick and efficient way, by dividing up the task of labeling among a large group of individuals via an easy to use web labeling app and feeding these labels into geo-processing tools to get our training data into the proper format.

A simple ArcGIS web app to draw polygons around road debris for training

We set up the above web app in only a few minutes using the web app builder that allows users to simply draw a polygon around road debris. This web app can be accessed directly via a URL so it was easy to share among everyone who wanted to help.

We even added a “scoreboard” to encourage some healthy competition!

A leader board for labeling damaged structures

We ended up labeling over 1,000 areas of debris in just a single day (with me being responsible for less then 20%)!

We can now easily pull this labeled data back into ArcGIS Pro as a feature class. We intersect the road debris polygons with a buffered road network using the Intersect GP tool, then utilize the Feature to Raster GP tool to convert the polygons into a raster for use with our downstream image segmentation model. Finally, we use the Export Training Data for Deep Learning tool again to acquire image chips labeled at a pixel level to train our semantic segmentation deep learning model. For the model itself we utilize the U-Net architecture from above, modified with a weighted loss function to handle the class imbalance (there’s much more road then there is road debris).

Notice the 0 weight for the first class, which was background (the classes were background, road, and debris). Here’s a comparison of some of the model’s outputs for an area unseen during training (outlined polygons indicate detected debris):

To get to the point above we had to do a similar process as during pre-processing to assure our predicted debris lies on the road. Taking the model’s predicted debris we apply the Clip Raster GP tool against the buffered road features to eliminate debris detected off the roads. Finally, we apply the Raster to Features GP tool to convert the output raster from the model into a familiar feature class we can do many other operations (such as clustering, routing, measurement) with.

Optimizing Routing

Now that we have the locations of the damaged structures and blocked roads, we are faced with the task of turning this into actionable intelligence. This is an often overlooked part of building AI models — usability in the field. The natural next step in disaster relief after having the locations of damaged structures and blocked roads is actually having responders deployed to each of the structures and reporting back what they find. The critical part of this is being able to build an optimal route, the quickest way to reach every structure while avoiding roads that are impassable. Mathematically, this problem can be represented by a weighted graph — where the edge weights represent the distances from one point to another. We also need a start node, where the deployed responders will start and end their route.

You can try and find the shortest route from and back to the red node in the above example and quickly realize that this can be quite a difficult problem. The problem quickly becomes time consuming for large networks.

Luckily for us, the frequency with which this problem is encountered in the real world has led to a number of very effective and practical algorithms for finding good routes. In fact, the network analyst extension in ArcGIS contains methods for solving this exact problem, honed through many decades of development and research (you can read about it here). This works well for us because we can simply feed our predictions (which are now a feature class thanks to the Raster to Feature GP tool described above) and directly apply the extension to find optimal routes (in addition to solving other important problems such as location allocation — which is very important for transporting rescued people to shelters and hospitals).

The result of a simple routing search to identified damaged structures from a local fire station

Where the Rubber Meets the Road

At this point we have the post disaster imagery, an underlying feature layer representing damaged structures (output from the first+second model), a feature layer representing blocked/damaged/passable roads (output from the third model), and a tool to build an optimal route to damaged structures from a designated start point and taking into account damaged roads. We can pull all this information into ArcGIS Pro and utilize a combination of ArcGIS Online and web apps to help disseminate this information quickly.

First, we build a live operations dashboard showing the current count of blocked roads and damaged roads for an area. This dashboard also shows the damaged structures that need to be addressed (as red polygons).

Our operations dashboard for disaster response

Secondly, we build integration into ArcGIS Workforce, so that first responders or workers can receive personalized routes and make updates to recovery status directly from their phones (this will automatically update the central dashboard showing the currently addressed structures and remaining ones). You can even see a single worker utilizing workforce in the dashboard above as a green person icon, currently addressing the structure near the center of the map.

Cutting edge: 3D Reconstruction

There’s nothing quite like a full browsable 3D scene for operational awareness. While from the work we’ve developed so far we understand where damage is located (and this is enough to deploy first responders efficiently in most cases), we’d like to also understand what type of structures are damaged so we can better equip our first responders. This can really only be done with a full reconstruction of the area — normally a very time and labor intensive procedure.

Luckily, Esri’s Drone2Map provides amazing capabilities to build 3D reconstructions from collections of aerial imagery. These 3D reconstructions are fully integrated into the Esri platform via scene layers, allowing us to pull in our AI model outputs and give a truly informative view of the ground-level destruction caused by the hurricane.

Below is a reconstruction created from a Civil Air Patrol flyover of Mexico Beach following Hurricane Michael-with our inference results overlaid and extruded to be visible. The power of ArcGIS allowed us to do the inferencing against 2D imagery and display the results against a 3D layer — leading to a truly compelling visualization that’s fully browsable and interactive for an operational picture that would be impossible otherwise.

Concluding Remarks and Planned Work

This is an on going project with our team and updates will be reflected here. There aren’t a ton of planned extensions at the moment, the workflow is currently under testing for different resolutions, areas, and disasters. If you have questions, would like clarifications/updates, or would like to engage with our team to pilot this work for your company/agency, feel free to reach out at ssohail@esri.com, or post a comment here.

Thank you for reading, it’s a pleasure being able to build meaningful AI solutions to help those in need and save lives, all while calling it work.