How to use Artificial Intelligence & Data Science to improve safety on motorways?
Road safety is the main problem of motorway operators. In 2013, 1,25 millions of persons died on highways in the world (Global Health Observatory (GHO), 2013). Another sobering figure is the life expectancy of a pedestrian on motorways: less than 20 minutes in France. It is a matter of life or death that patrollers reach the scene of an incident as fast as possible whenever it happens. Highway operators are collecting more and more data with connected cars, sensors and machine-to-machine communication. So, how to use these tremendous amounts of data and new technologies to improve road safety? Qucit built RoadPredict: a dedicated tool for operators relying on machine learning that predicts when and where incidents will happen to optimize their patrols and save lives!
How doest it work? Firstly, this technology combines all contextual data available and digitizes a road:
- Static data: surfacing, location of the roads
- Dynamic data: calendar, events, weather forecast
- Sensor data: connected cars, traffic
- Calibration data: historical data of incidents (accident, animals, breakdowns, wrong ways…)
No need for a tremendous amount of historical data to get highly reliable predictions. Then, RoadPredict automatically cleans the collected data to train machine learning models that predict where and when the probability to get an incident is the highest. A real data-driven digital tool to help road operators and safety patrols anticipate the incidents to be ready & reach the incident scene faster.
With 11,100 kilometres of expressways in 2014 (NationMaster, 2015) and an average traffic of 57,031 vehicles per day ((ASFA (Autoroutes & Ouvrages Concédés), 2017), France is ranked 7 th in terms of road network in the world. France has improved its road safety for the past years: the number of physical accident decreased by 16% between 2010 and 2015 (Observatoire national Interministériel de la sécurité routière, 2016)). However, in 2016, there were 450,000 interventions, 56,109 physical accidents ((Observatoire national Interministériel de la sécurité routière, 2016) and 100 patrollers ((Le site de la sécurité du personnel autoroutier, 2016) struck during an intervention. Roads are highly related to the French history. Since 1955 and the end of the World War II, the government have introduced a reform on highways to make them private. The breakthrough took place in 2000 when the government conceded 9,137 kilometres of motorways to private companies. One of them is ATLANDES: a private consortium of 6 companies operating the A63. This 206-kilometre stretch of highway is located in the South-West of France, between Bordeaux and Bayonne. The motorway operator is Egis Exploitation Aquitaine (EEA). Like every highway operator, EEA looks for an accident rate as close as possible to 0. To begin with, they wish “to optimize their responsiveness to improve the safety of customers and patrollers” (Lengrand, 2017). They have many available data in relation to their systems, and even more coming up in the future thanks to M2M and IOT. However the question is how to use this available amount of data to predict incidents and improve road safety with their existing tools? Indeed, incidents are rare and very difficult to prevent or predict based on the existing data for a human being. This paper demonstrates the usefulness of a predictive tool to improve road safety. We present the scientific protocol used in RoadPredict. We then explain the different steps involved in the pipeline of creating an accurate predictive model. Finally, we dive deeper into how artificial intelligence helps road operators improve the overall road safety (based on the use case of EEA on the A63).
Using artificial intelligence to solve a complex issue such as predicting incidents on roads implies a strong and proven methodology. A protocol of 6 distinct phases to deal with such a complex phenomenon has been set up: data collection, data cleaning, feature interpolation, machine learning, prediction, provision and visualization. The same protocol (described in the following section) is used throughout the different products developed by Qucit (BikePredict, ParkPredict, ComfortPredict, and RoadPredict). This approach is thus formalized in many internal code libraries and allows Qucit to rapidly iterate on new products.
Before anything meaningful can happen, data is needed. In the methodology described hereafter, data comes in different formats. To classify the collected data at Qucit, two main attributes are considered:
- Where the data come from: is it from our clients, open source, or bought from different providers
- How much these data vary in time: is it hourly, weekly, monthly or yearly.
Based on this observation, here is a description of the steps performed in each case: [et_bloom_locked optin_id=”optin_1″]
- The client provides historical data (that evolves through time) about the phenomenon he wants to understand. For example, in the case of RoadPredict, the client provides data about all the incidents that happened over the last 3 years. The more data provided, the better. Incidents on highway are diverse such as: car crashes, truck crashes, breakdowns, contraflows, stray animals, traffic jams…
- The client also provides system data. In the case of RoadPredict, the system is the highway. Therefore, the client provides all necessary information to digitalize the highway such as: route of highway (in a GeoJSON format), surfacing, location of road signs, rest areas…
In addition to proprietary data, Qucit collects many contextual data that allow a better comprehension of the studied phenomenon. In fact, contextual data provide additional information that are very valuable when predicting a spatial phenomenon. In the case of RoadPredict, contextual data adds information about the static surroundings of the highway (nearby forests for example), the dynamic driving conditions (via the weather features) and affluence (if it is a major holiday thus generating more traffic).
- OSM (Open Street Map, 2017): includes points of interest (shops, coffees, cinemas, banks, parks, and so on) that are useful to describe the surrounding environment. This type of data is periodically collected using the OSM API and stored for later user.
- Open data: this type of data is similar to OSM in its nature but provides richer and more precise information.
- Weather: a dozen of variables (temperature, precipitation, wind speed, wind direction, fog…) are collected every hour and stored.
- Sensors: sensor data (from cars for example) are also collected whenever available.
In general, the collected data are not clean and contain random noise. In addition, such collected data are most of the time highly dimensional (lots of features). Dimensionality reduction techniques are then applied to increase the signal-to-noise ratio and make the data more useful. For example, we use PCA (Smith, 2002) to reduce the number of features. In addition, we use models that automatically select the best features and discard spurious ones. For example, if the model sees two features that have very close contributions (say number of coffee shops and cinemas), it will select only one of them and not both. Feature selection and dimensionality reduction are two very important techniques in this case since we often deal with lots of features and not many samples (to avoid the curse of dimensionality (Abbeel, 2017)).
Once the data is collected, we need to make sure that we have values everywhere in the studied system. A sophisticated proprietary spatial interpolation is performed to get the missing values and create coherent input features for our models.
In concert with the client, one or more targets are chosen to be predicted. The metric to optimize is also selected. This depends on the business needs and the pain points that the predictive tool will be solving. In the case of RoadPredict, it can be: how fast should the intervention be or how precise are the predicted incidents. Some mathematical metrics are more appropriate than others. Once this preliminary work is done, a machine learning model is trained. This phase can be computationally intensive and is done only a few times (e.g. redone every time new data is received). During this phase, hyperparameters (i.e. parameters about the structure of the model) are also optimized. At the end of the training phase, the models (along with the hyperparameters and metadata) are saved for later use. Notice that the models are evaluated based on a testing set (i.e. data that is never used during the training process). For example, in the case of RoadPredict, the first 3 months of 2017 have been used to evaluate the model performances.
Once the models have been trained and optimized, the predictions can be performed. Before that, these steps should be performed:
- Load the models: depending on the target that we want to predict, the correct model should be loaded into memory.
- Prepare the data:
- For static data, historical data can be used.
- For dynamic data (those that vary in time), depending on the time horizon of the predictions:
- Use real-time if we want to predict short-term (less than 1h for example)
- Predict the data or use the same values one year before.
Let us take the weather as an example. Imagine we want to predict a phenomenon in one hour and we use the weather data. Two options are possible: either we predict the weather in 1h, or we use the weather state from the previous year.
Provision of predictions
Predictions are useful when they improve the daily life of operators and, more globally, of citizens. To make them exploitable, the last part of the protocol is to provide these valuable insights through the appropriate channel. It can be either through:
- An API: to integrate the result of the prediction directly to an existing software.
- A Webserver application (in the case of Qucit, the Urban Predictive Platform): to provide a tool dedicated to business needs. For RoadPredict, the predictions are available through a map, graphs and table indicating the location (geolocation and time) where and when an incident has the highest probability to occur.
Result and discussion
Use case: A63, French Highway, Egis Exploitation Aquitaine Pain Points EEA looks for an accident rate near to 0. To start, they wish “to optimize their responsiveness to improve the safety of customers and patrollers Their goal is to secure even better the highway by trying to anticipate the incidents and limit the consequences for our customers in a minimum amount of time” ((Lengrand, 2017)) They have many data related to their systems available and even more coming up in the future thanks to M2M and IOT. Incidents are rare and very difficult to prevent and predict by a human being. Perimeters and protocol applied The protocol of RoadPredict presented hereinbefore has been applied since the beginning of 2017. The aim: predict where and when the probability to have an incident is the highest for the next day, the current week, month and even year.The incidents predicted are: accidents, contraflows, stray animals, breakdowns, traffic jams, obstacles. Incident forecasts are provided to the patrolling team and enable patrollers to know about the potential upcoming events. These predictions are accessible via a web dashboard. RoadPredict is currently used on the field by EEA on the A63. The first result has been displayed on July 2017.
Comparing the ground-truth data of incidents to the output predictions of RoadPredict from January 2017 to April 2017, we see that the model successfully predicted 65,64% of incidents on the A63. It outperforms the historical average of the benchmark model by 14 points. In the case of the A63, other data flows will be integrated in the second semester of 2017, which should improve significantly the performance of the model. Overall context In the absence of incident predictions, the logistic planning of safety patrols on highway is invariable. Indeed, if a strategy is a priori supposed optimal, it will still be optimal the day after. However, a human is able to make rudimentary predictions. Based on his experience, he can roughly take into account the main parameters, such as calendar data. He would then know and exploit that on a busy day, it is necessary to increase the number of trucks patrolling on the highway. We can do much better by using automatic tools from machine learning. Pain points Using a model that predicts incidents, the knowledge of the patrol manager can also increase. With a finer granularity and a greater accuracy, he would make better decisions. For example, it is intuitive that more incidents occur during high traffic days, but knowing their likely locations can help to better position the patrol trucks. However, how to assess whether one strategy is better than another, or even what is the best strategy? A simple empirical assessment is not enough. Incidents occur too rarely to let us make relevant observations over a short period of time, and the implementation of an experimental strategy over a long period of time could be costly or even dangerous. RoadPredict Safety Patrol: a simulation tool to compare strategies and define optimal patrols.
To go further: RoadSafety Patrol, a tool to turn RoadPredict into an actionable tool
In the absence of incident predictions, the logistic planning of safety patrols on highway is invariable. Indeed, if a strategy is a priori supposed optimal, it will still be optimal the day after. However, a human is able to make rudimentary predictions. Based on his experience, he can roughly take into account the main parameters, such as calendar data. He would then know and exploit that on a busy day, it is necessary to increase the number of trucks patrolling on the highway. We can do much better by using automatic tools from machine learning.
Using a model that predicts incidents, the knowledge of the patrol manager can also increase. With a finer granularity and a greater accuracy, he would make better decisions. For example, it is intuitive that more incidents occur during high traffic days, but knowing their likely locations can help to better position the patrol trucks. However, how to assess whether one strategy is better than another, or even what is the best strategy?
A simple empirical assessment is not enough. Incidents occur too rarely to let us make relevant observations over a short period of time, and the implementation of an experimental strategy over a long period of time could be costly or even dangerous. RoadPredict Safety Patrol: a simulation tool to compare strategies and define optimal patrols.
Once the client used the web application to define its strategy (e.g., the number of trucks and their behaviors), the program simulates the operation of the motorway on a given time range. A visualization of the trucks patrolling the highway is then displayed, showing their locations, the appearance of incidents, and the main statistics for the simulation run. We monitor the distribution of times before a truck reaches an incident, the number of times an on-call truck is used, the traveled kilometers, and so on.
To ensure the reliability of RoadPredict Safety Patrol, we implemented the strategy devised by EEA on the A63. If the statistics obtained for the simulation over the past years are identical to those obtained by EEA in reality, we deem the simulation trustworthy. A run with a new strategy then gives us results that would also be close to reality in the case of a field installation. We can then test multiple strategies and select the best one.
Instead of simulating patrolling strategies by hand, it would be much better to leverage machine learning (via reinforcement learning for instance) to generate and test more diverse strategies. This automatic methodology optimizes a score that depends primarily on the truck arrival time at the incident location, the time interval between two visits at the same location, the truck mileage, and the number of on-call truck interventions. Finally, penalties should be incorporated to forbid non-feasible strategies (e.g. exceeding speed limits or driving opposite to the allowed direction).
- Yassine Alouini, Data Scientist Qucit, M.A.S.T in Applied Mathematics, Cambridge, firstname.lastname@example.org
- Rémi Delassus, PhD Student Deep Learning for Semantic Segmentation of Aerial Imagery, email@example.com
- Marie Quinquis, Sales & Marketing Manager Qucit, Master 2 Business & Marketing strategy EM Normandie, firstname.lastname@example.org
- Abbeel, P. (2017). Optimal Control for Linear Dynamical System and Quadratic Cost. UC Berkeley. Berkeley: UC Berkeley EECS.
- ASFA (Autoroutes & Ouvrages Concédés). (2017). Chiffres Clés — Key Figures. Paris: ASFA.
- Global Health Observatory (GHO). (2013, 06 01). Number of road traffic deaths. Retrieved 07 31, 2017, from World Health Organization: http://www.who.int/gho/road_safety/mortality/traffic_deaths_number/en/
- Le site de la sécurité du personnel autoroutier. (2016, 12 31). Le bilan des accidents. Retrieved 08 10, 2017, from Le site de la sécurité du personnel autoroutier: http://www.personnel-autoroutes.fr/fr/les-chiffres/le-bilan-des-accidents.htm
- Lengrand, R. (2017, 06 15). Why using AI and RoadPredict on A63. (M. Quinquis, Interviewer)
- NationMaster. (2015, 06 01). Expressway length : Countries Compared. Retrieved 08 09, 2017, from NationMaster: http://www.nationmaster.com/country-info/stats/Transport/Road/Expressway-length
- Observatoire national Interministériel de la sécurité routière. (2016). Accidentalité routière 2015 — estimations au 26 janvier 2016. France: APAM.
- Open Street Map. (2017, 08 10). Open Street Map. Retrieved 08 10, 2017, from OSM: https://www.openstreetmap.org/#map=13/45.5823/5.9063
- Smith, L. I. (2002). A tutorial on Principal Component Analysis. Otago: University of Otago.