How to detect and grade driving style from observational data

A deep dive into Machine Learning-Based Driving Style Analysis and Grading

Published in

tb.lx insider

10 min readFeb 26, 2021

The Mercedes-Benz GenH2 truck — The Mercedes-Benz GenH2 Truck. Image courtesy of Daimler

In this article, we will guide you through an approach developed within our company, tb.lx by Daimler Trucks & Buses, to analyze and grade driving styles. Driving style grading is an essential aspect of quality delivery for transportation companies, and safer driving styles correlate to lower operational costs and reduced traffic incidents. This is the main objective of the products we develop for our customers.

The traditional approaches to driving analysis and grading have relied mostly on large sets of hardwired rules that are hard to generate and maintain. Still, this approach has the distinct advantage of being explainable to humans, despite being demanding when it comes down to maintenance. For example, the addition of a new vehicle type to the fleet may require an update to the ruleset, along with the mandatory testing. So, here we describe an alternative machine learning-based approach that attempts to solve this issue by avoiding manual maintenance, retaining explainability, and allowing context sensitivity.

It is also possible to apply this approach to other automotive industry areas, such as the battery health management for electric vehicles. Battery health is dependent on several factors well beyond mere charge cycles. Environmental and behavioral factors can influence the overall battery health and expected life cycle. Much like driving behavior, we can teach a machine learning model to recognize both the typical and uncommon battery uses, building up to a context-sensitive grading system.

The Big Picture

Truck driver behavior varies across countries, cultures, activities, and law enforcement practices. Some scientific studies [1] on the impact of vigorous traffic citation policies reveal decreased road accidents and fatalities due to these, and traffic signage quality, road conditions, and design also contribute to the incident count. All these factors, and probably more, contribute to both road safety and driving behavior and are impossible to bake into a generic rule-based rating system.

Our approach uses the telematics data generated by these heavy-duty vehicles to infer what the expected “normal” driving conditions for specific road segments are. We can collect data that might help characterize different driving issues such as aggressive or risky driving, inattentive driving, or even drunk driving. We chose to use the first issue type, aggressive or dangerous driving, to build the models due to data availability. By detecting these events through machine learning models, we can then aggregate the information into a scoring system for either the driver or for trip segments.

Context and Motivation

Many strong motivators led us to develop this driving grade model. First and foremost are the operational costs and safety. Any traffic incident is prone to induce disruption and added costs to the transport operator. Among the possible costs associated with a road accident are damage to the vehicle or cargo, personal injuries, delayed or missed deliveries, and potential civil responsibilities. Moreover, stopped vehicles do not generate revenue, only costs.

Developing a driving grade model helps us not only to avoid accidents but also to keep drivers and third parties safe. There are other potential benefits to reap from such a model, namely as a source for driver coaching, by helping managers better evaluate and compensate drivers, improving overall fuel consumption, and even identifying safer routes. To better assist the delivery of these goals, the driving grade model must be interpretable, meaning that a human counterpart should understand why the model produces a given output.

Model Overview

We start by feeding the model with data that detects unsafe or aggressive driving behavior, such as large absolute accelerations, sharp turns, violation of the safety distance, among others. This data will allow for the discrimination of the notable event types, allowing for a posterior scoring of the trip or driver and a final ranking.

Instead of a rule-based model, we developed a machine learning-based model for several reasons. While rule-based systems are inherently interpretable and do not require large volumes of data to devise, they have some significant shortcomings. Rule-based models rely on hard thresholds for their decision-making processes, making them brittle. Any change in the environment, be it a new vehicle type or a change in traffic conditions or rules, might render the model useless. The number of said rules may become too large to manage and maintain comfortably by software developers and engineers. Rule design relies on individual variables (univariate) because interactions between multiple inputs are tough to model. These models also suffer from difficult generalization between populations, fleet activity, driving context, do not evolve naturally, and are essentially binary, not probabilistic.

On the other hand, machine learning-based models learn from the population behavior, are easier to extend with more input variables, and understand interactions between multiple input features (multivariate). These models are adaptive as they can evolve by adapting to changes in the target environment, learning new patterns, and keeping up the drivers’ challenge. Finally, the predictions of many of these models are inherently continuous, not binary. Their most significant shortcomings are the need for potentially large amounts of data to train on and tend to be more challenging to interpret. Fortunately, recent research has tackled this issue quite successfully through interpretability models.

Model Assumptions

We feed the machine learning model with high-frequency, unlabeled, telematics data generated by the vehicle’s on-board sensors. The signals we collected have different scales and behaviors depending on their semantics. The picture below depicts a braking event, where the first signal, the speed, decreases as the breaking pedal signal surges (last chart).

Telematics data sample (Image source: Author)

In normal driving conditions, risky or aggressive driving events should be rare. This assumption is a cornerstone of our model as it allows us to use anomaly detection as the first step on our model pipeline. We used multivariate Isolation Forest models [2] to detect such anomalies, as further explained below.

Isolation Forest

An Isolation Forest is an anomaly detection algorithm designed to assume that anomalies are the minority observations and have very distinct attributes.

In other words, anomalies are ‘few and different’, which makes them more susceptible to isolation than normal points. [2]

The algorithm uses an ensemble of Isolation Trees to detect anomalies in the sample. Each tree partitions the space using lines parallel to the principal axis by randomly selecting features and partition points. The recursive partitioning of the data creates a tree that references all the sample points, and the detected anomalies will lie closer to the root. An Isolation Forest has the advantage of being scalable and also able to handle large feature vectors. The algorithm also scores each data point with an anomaly score related to the average distance to the root across all forest trees.

Outlier Ensemble

The primary model organizes Isolation Forest models into an Outlier Ensemble [3] to infer a “silver standard” model that will, in turn, train one interpretable anomaly scorer. The idea is that in the absence of labeled data, the model uses the Outlier Ensemble as a surrogate to generate labels. As we do not have labels, it is impossible to set the Isolation Forest model hyperparameters using random or grid search. Therefore, by training several Isolation Forest models with different hyperparameters and aggregating them into a single ensemble, we potentially obtain a more robust model to estimate the “silver standard.” The model then feeds the labeled dataset into an interpretable machine learning model to learn how to score anomalies.

In a nutshell, the model learns from the data using two steps. In the first step, it trains the “silver standard” model through the outlier ensemble. The model then uses the newly-labeled data to train an interpretable machine learning model that will score the production data.

Model Pipeline

The model pipeline begins with the ingestion of the raw telematics data into a preprocessing step, where we extract trip information, enhance it and add external data. The trip extraction process aggregates the raw telematics data into individual trips, with defined start and stop locations. The data enhancement step enriches the trip telematics data with map-matched information such as the road type.

The pipeline now splits the data into a training set and a scoring set. Both go through the same data encoding process that prepares the enhanced data before being fed to the outlier ensemble and the interpretable scorer. The encoding process consists of an initial sampling of the data, window extraction, and feature computation over these windows. Here, the model calculates functions like the range, the standard deviation, or even the point value of features such as the lateral and longitudinal acceleration, the angular speed, the cruise control setting, the vehicle speed, time to collision, brake pedal position, and more. We also classify trip aggregate data according to the road type, as we assume that highway behaviors should be different from that on urban or suburban roads.

Now that both datasets are correctly encoded, we feed the first to the Outlier Ensemble model and infer the silver standard. This model is the best possible approach to the nonexistent gold standard: a human-labeled dataset. The interpretable anomaly scorer, a regressor model, trains on the second dataset using the silver standard model labels.

The Interpretable Regressor

As previously stated, we train the regressor model using the “silver standard” and the training dataset. For this particular case, we chose the LightGBM machine learning model explained through SHAP. The Python SHAP package directly supports LightGBM models, so implementing an explainable model was straightforward.

Machine learning model explainability through SHAP has the advantage of allowing for local interpretability. You can “zoom into” a specific event and understand how much each feature contributed to the prediction. Let’s see an example:

The SHAP chart above depicts the most relevant features for the extreme event classification: an abrupt turn. (Image source: Author)

To better visualize these extreme events, we plotted them directly on the map, using the location telematics. The results were quite impressive. The image below depicts a harsh break event at an intersection.

The map above visually shows a harsh break event on an intersection. Blue dots are regular events, while red dots display abnormal behavior. The dot size directly relates to the corresponding event aggressiveness. (Image source: Author)

Next, we display below the corresponding feature importances as calculated by SHAP. Features computed over the brake pedal or the longitudinal acceleration contributes more to the increase of the anomaly score (orange bars), which aligns with the extreme braking event. In contrast, the last bar (value 255) shows no vehicle in front, which slightly decreases the score, as it is a safer condition for driving.

The above image shows the feature importances for a harsh break event. Note how relevant the brake pedal and longitudinal acceleration features are. (Image source: Author)

From Events to Grades

We now look at the final step of the model, where the classified events turn into grades. Instead of having a local concern as we have had so far, we will now look at the big picture and aggregate these events into a global grade. In the absence of driver information from our data, we focused instead on grading trips, so we can more accurately call this model “driving style grading.”

For each trip, we calculate a grade that is a function of three variables: the number of aggressive events, the severity of said events, and the non-aggressive behavior scores.

The image above shows the distributions for the three components of the driving grade: number of aggressive events, the severity of abnormal events, and the score for regular events. (Image source: Author)

This type of grading is not static as it changes with the trip’s evolving nature and general driving behavior. The grading’s dynamic nature allows it to follow new driving style trends, road geometry changes, and even regulatory updates. The challenge for drivers is on!

Let us see how to build a final grade from the local events’ scores. The following map shows all the scored events for a single trip. Red dots represent abnormal behavior, and their sizes relate to the event’s aggressiveness. Blue dots are regular events, with smaller sizes displaying better behavior.

The above map shows the scored events for a single trip segment. (Image source: Author)

By using the approach described above, we can derive a single grade by placing all three indicators on a radar chart, as displayed below.

The radar chart above shows how we can derive a unique grade from the three independent scores. (Image source: Author)

We followed the approach of deriving a single grade for the whole trip by calculating the rate between the triangle area in the radar chart above and the area of the largest possible triangle (a circumscribed isosceles triangle). This approach requires us to reverse all axes’ scale, as the best possible scenario is one of no events, small regular event scores, and zero total anomaly score.

Conclusion

Driving behavior analysis and grading are essential assets for fleet monitoring, as they improve safety and reduce operational costs. Traditional driver grading solutions relied on heavy use of hard-written and brittle rules that are costly to maintain. Nevertheless, these legacy systems can explain their outputs, making it very easy for human counterparts to understand their reasoning.

As an alternative to these systems, we propose using a machine learning-based model that is adaptive, multidimensional, and outputs continuous scores. This process models normal behavior by detecting anomalies and scores them using an interpretable regressor. The final trip grade is a function of counting normal and abnormal events and their respective score.

Future Work

As a startup developing sustainable transportation solutions for Daimler Trucks & Buses, we reckon there is a lot of room to grow from here. One future dimension of research will lie in recognizing different driver profiles. Long-haul driving on highways is very different from driving a garbage collection truck in a suburban neighborhood, for example. We also want to enhance the variety of input signals to weather measurements, road slope, and vehicle weight, to name a few.

Acknowledgments

This article draws on the applied research work of our former colleague Sérgio Pereira.

References

[1] Rezapour Mashhadi, Mohammad Mahdi & Saha, Ph.D., Promothes & Ksaibati, Khaled. (2017). Impact of Traffic Enforcement on Traffic Safety. International Journal of Police Science & Management. 19. 146135571773083. 10.1177/1461355717730836.

[2] Fei Tony Liu, Kai Min Ting, and Zhi-Hua Zhou. Isolation Forest. ICDM Conference, 2008

[3] Charu C. Aggarwal. Outlier Analysis, Second Edition. Springer International Publishing AG, 2017