Predicting traffic accidents from above

How to scrape your own satellite images for deep learning

Sabatino Chen
5 min readJun 10, 2019

Problem

Each year about 1.25 million people die in road traffic accidents, and an additional 20–50 million are injured or disabled. If the locations of traffic accidents could be predicted, this could have a huge beneficial impact in potentially helping to reduce the number of accidents each year. For example, routing software could avoid the most dangerous areas — particularly in the context of the coming advent of driverless cars. It could also be useful in an insurance context, in order to predict risk, as well as for governments and local road authorities looking to create more efficient systems of road maintenance and improvements.

Methodology

For this project, my classmate Laura Lewis and I didn’t want to just look at traditional structured data and machine learning models. Instead, we wanted to find out if satellite imagery could be combined with other datasets through deep learning in order to increase our ability to predict where traffic accidents are likely to occur. We broke our project down into four different models:

Model 1: A combination of accident, population density and traffic data from the UK, where we focused on accidents in London. Different machine learning models were built to see if the level of accident severity could be predicted.

Image result for CNN deep learning

Model 2: Using satellite images of London that were scraped using Google Maps Static API and fed into a Convolutional Neural Network (CNN) in order to predict where traffic accidents are likely to occur.

Model 3: Makes use of Keras functional API to combine features from model one and image features extracted from a CNN (similar to model two) to create a mixed-input or mixed data model. Both data types are fed into separate deep learning models and their outputs are combined in the final layers in order to predict whether a given area is likely to have traffic accidents or not.

Model 4: Uses the same model architecture as model three, but applies it to the task of distinguishing between areas with no traffic accidents and areas with serious or fatal traffic accidents, in order to predict the locations of the worst traffic accidents.

In this blog post I will be focusing on how we used Google Maps Static API to scrape the images needed for models two, three, and four.

Step 1: Sign on or create a Google Cloud Platform account.

Step 2: Get your API key.

The steps below show how you will navigate on the Google Cloud Platform to acquire your API key.

This is your key that will allow you to call on the desired application programming interface (API). Google has many different APIs that can be seen here. Again, we used the Maps Static API.

Step 3: Use your API key to scrape satellite images in Python.

Once you are signed up and have gotten the credentials for your API key, you can then go and scrape images in Python using just a few lines of code. Below are the packages you will need to import, along with a function that basically creates a URL to get the desired image and stores it in a folder that you create.

Notice in our base variable, is the structure for the url where you can change certain parameters. Since we wanted satellite images over the city of London we set maptype=satellite. Also at the end is the center= parameter which will be followed by the location, or in our case latitude and longitude, coord, which is then followed by your key variable. Then all that is needed is to create a folder where you want the function to store your images, and pass in a list of coordinates into the function along with your key.

This simple function allowed us to store all the satellite images we wanted of the areas around London with and without accidents.

Results

In the end, our findings showed that being able to predict accident severity in model one was quite difficult due to the imbalance between slight, serious, and fatal accidents (58% accuracy). I will be going over this further in a future blog post.

Using satellite images with a CNN in model two yielded strong results in predicting traffic accident locations (77% accuracy). My classmate, Laura Lewis, discusses this model further in her blog post.

The mixed data architectures in models three and four had even higher accuracies (80% for no accidents vs. any accidents, and 82% for no accidents vs. serious or fatal accidents), producing better results from combining image features with structured data.

Overall, we were able to demonstrate that combining satellite images with structured data can increase the predictive capacity of a model to predict the location of traffic accidents. What is encouraging is that there are several promising options to improve the models further, beyond this initial proof of concept.

For those interested, our notebooks going through all the steps of the project can be found here. Thanks for reading and I hope you enjoyed!

--

--

Sabatino Chen

Data scientist with a passion for solving problems through analytics. Academic background in mathematics along with a career in professional basketball.