Project Prometheus — An AI-powered fire finding solution
David Azcona (Ph.D. candidate, Dublin City University), Murong He (M.Sc. in Business Analytics, Arizona State University) and I have been working in a project called Prometheus: an early stage fire detection solution that combines AI, computer vision, autopiloted drones and weather services to detect wildfires before they spread too large. In this post, I will share some of the details about this project, how it works, the theory behind it and where you can find it. I am keeping this as technology-agnostic as possible so if you want to see any technical detail about the implementation (CNTK, Faster R-CNN, dockers, python, .net framework, and so on) you can check the GitHub repo. I will just mention the technologies used.
A short intro video:
In 2017, they estimated that wildfires cost to the US economy around 200 billion dollars in damages; roughly less than hurricanes. The difference is that wildfires if contained while small, the number of damages can actually be minimized. However, fire detection is a mundane task: it is difficult and it is manual — most of the time it is people sitting there with binoculars on fire watch towers trying to see them, or overflying areas with helicopters or piloted UAVs. The idea behind Project Prometheus is trying to automate this mundane task with autopiloted drones and detect them while they are small as otherwise, because of factors like wind, humidity or terrain’s geography, they would spread very quickly to the order of acres in a matter of minutes. What is more, a lot of time these fires start in remote areas where there is no a lot of people and hence nobody is there to monitor and report them.
If we can just catch them early, if we can have these drones in remote areas, it can really minimize the damages.
Project Prometheus can be divided into three sorts of modules: The Fire Detection module which uses a deep learning algorithm to detect small fires using drone’s RGB camera (implemented in a REST service published in the cloud), fly planning which allows the user to select and plan which areas the drone should fly (implemented in a Windows application and integrated with map weather services) and the alert system that enables the user to notify the rapid response team about a fire ongoing (using Azure Functions with Twilio). I will get into some details about each of these parts and how they work:
We use autopilot drones to look for fires in remote areas. The RGB camera installed onboard the drone is used to map out the entire area by taking pictures of the terrain. Those images are then submitted to a machine learning model which will detect fire and notify the user. End of the story. Wait…
Lesson 1: size does matter
Detecting fire using Computer Vision in the proportions we are interested in is a hard task, which is why we had to take a different approach to tackle it. You may start thinking that RGB cameras are a bad choice as we may get better results with an infrared camera. While talking with firefighters we found out that they are not such a sweet spot as you may think: they will detect everything as a fire in places where the air is hot, some surfaces will be detected as heat sources just because their reflective properties and, if you care about the money, they are more expensive. Firefighters use these cameras mainly during the night to check if a fire has been completely extinguished.
[Back to our RGB camera] In general, if you want to classify an image using machine learning into a certain category (fire, no-fire), you use image classification techniques. Easy peasy. However, it could happen that the characteristics that are required to perform categorization are too small with respect to the full image. A typical picture for a small fire would look like this:
In such cases — and our case — you would achieve better performance with object detection even if you are not interested in the exact location or counts of the object within the image. However, the structure of an object detection solution is usually a bit more complex. You have multiple ways to solve this problem. The one we took was a regional convolutional neural network (R-CNN) with the Faster R-CNN implementation.
Other popular choises include YoLo (Joseph Redmon, Ali Farhadi) and Detectron (Facebook AI Lab, recently open-sourced).
Basically, you do the following:
Localization: You need a way to generate (sample) areas inside the picture that may contain what you are looking for. Those areas are called Regions of Interest (ROI). These region proposals are a large set of bounding boxes spanning the entire image. In our case, we generated the ROIs using the approach described in the paper “Segmentation as Selective Search for Object Recognition” by Koen E. A. van de Sande, et al. There is an implementation inside the dlib python library.
Object classification: Visual features are then extracted for each of the bounding boxes, they are evaluated and it is determined whether and which objects are present in the proposals based on such features. (more on this later)
Non-maximum suppression: It may happen that sampled bounding boxes overlap, sometimes representing the same object. To avoid duplicates, overlapping boxes are combined into a single bounding box. This may be a very computing intensive task but the Intel Math Library has some routines optimized for this.
Lesson 2: your data may need a little help
In any machine learning problem, you need data. However, in deep learning (e.i. a highly dimensional input space) you need a lot of it in order to extract interesting visual features. It is not hard to see that getting a big data set of the objects we are interested in (fire) is not as easy, right? To solve this shortcoming we use a technique called transfer learning with a pretrained general-purpose image classification models to extract visual features, as these tend to generalize well. Technically speaking you take a pre-trained model for detecting something and “fine-tune” the model with your our own dataset. The idea is that this pre-trained model will act as a feature extractor. You are trying then to reuse the representations (visual features) learned for task A — typically a high-level task — to then be applied to task B — typically a lower-level task. The degree of success at task B indicates how much the task A model has learned about task B. In this case Task A is the ImageNet object classification problem and Task B is our fire detection problem.
You apply this technique by removing the last layer of the pre-trained network you are using and replace it with your own classifier. You then freeze the weights of all the other layers and train the network normally. Voila!
The resulting network we trained is not available in the GitHub repository because of the size — around 250MB. If you need it you let me know.
There are a lot of pretrainned models for ImageNet (AlexNet, VGG, Inception, RestNet, etc). All of them with a different tradeoff of speed, accuracy, and structure. We choose AlexNet as it is computationally cheaper and we saw not a big difference with the others (for this specific task).
Lesson 3: videos (especially their frames) are your best friend
Even though with the help of transfer learning, we still needed a bit enough data set to solve the classification problem. What did we use? videos. The cool thing about videos is that you get a lot of frames from each video and you can end up with a nice and big dataset quickly. Videos have another interesting property: if the object or the camera is moving, you get images of the object under different lighting, angles, and positions — constructing a very robust dataset too.
Dataset is not uploaded in the repository because of size, but ping me if you want to have it look at it.
We collected videos from drone’s video blog platforms and we manually tagged them. There is a couple of tools out there to tag images in different formats depending on the deep learning framework you are using. Nice choices are LabelImg for Linux/Windows users and RectLabel for Mac. Since we used CNTK, Microsoft has its own tool call VoTT that exports to CNTK format as well as TensorFlow and it is available for both Windows and Mac. We used that tool.
Lesson 4: Buy you a GPU (or rent it from the cloud)
Training a big model like this ones is really computational intensive but GPU can cut it off dramatically. In our case, it takes around 15 minutes to train the network using an NVIDIA GeForce GTX 1050. However, parameter tuning is another big problem even with GPU support. Microsoft has a nice tool called Azure Experimentation Service that will allow you to submit multiple training jobs in parallel to the cloud with different parameters and report you back the accuracy they got. Amazon with its AWS SageMaker is another good choice.
In the GitHub repo you will find a script called Sweep_parameters.py that will sweep the parameter space and submit jobs automatically.
So we have our R-CNN trained and featuring all the cool and trendy buzzwords are around like transfer learning, deep learning, GPU and so on. How do you push it to a place where it’s available for people to use it? We first exposed a REST service to interact with the model. This API allows submitting images to score and returns the areas inside the image where it believes there is a fire along with the confidence level. You can also report to the API is there is actually a fire or not in the image. Those other endpoints are using to get feedback and improve over time.
The REST API service is packaged inside of a docker container and published in the cloud where we can achieve scalability in a cost-effective way. The docker image file can be found in the repository.
Red flag alerts
You may be wondering how Prometheus knows which areas should fly. Does it fly the entire world using a super secret battery that only myself and my other two teammates know about? Since it is a secret I will say no. Prometheus integrates with country-specific weather services to detect something called Red Flag Alerts. Those red flag alerts are areas matching specific conditions of temperature, wind, humidity, and pressure that make the regions risky. Since they are country specific, I’m afraid they are not available for all of the countries. Currently, we can get them from the United States and from Argentina.
Map Weather information
It also really important to have ground truth about the weather conditions in the areas you are interested in. This was a feature the firefighters we worked with requested. To get the data, we query the weather stations using a map server that the National Weather Service publishes and we rendered it over our map. It is important to notice that this is not a forecast. Those are real readings:
If you are wondering how this maps works… you are not alone. It took me some weeks until I cracked how they work and how they have to be consumed. There is no that much documentation in the net. In the case you want to play with them by yourself, you will need to get the API keys from the providers since we are not allowed to share them.
Once a fire is detected by the platform, the operator is asked to confirm the fire. The UI looks like the following:
As you can see we are pretty accurate even with really hard scenarios like this one. The small red box indicates the precense of a fire with a 67% confidence level. In fact, Prometheus struggles to detect big fires and it gets unstable in the way regions are computed in such scenarios. This is by design and it’s totally fine. We are not interested in them.
The alert system sends SMS text notifications along with the GPS coordinates where the fire was found to a set of phone numbers you can configure. Those messages are sent thanks to Twilio, a cloud communications platform which allows you, among other things, to programmatically send and receive text messages using its web service APIs.
Do you want to have a look?
Find the source code at https://github.com/santiagxf/prometheus
Prometheus was developed with the cooperation and help of Arizona Tempe Fire Department, Argentina Fire Fighters and Argentina National Institute of Agricultural Technology (INTA). A big thanks to them to make this possible.