How BlaBlaCar uses Machine Learning to maximise the success of new drivers

Published in

BlaBlaCar

6 min readMar 24, 2021

Article co-written with Antoine Barczewski, Data Scientist at BlaBlaCar.

When a member publishes their first trip, our goal is to help them make the most out of the platform by maximizing their chances of getting passengers. We realized that we could achieve this goal by using data to create personalized recommendations and tips. Our recommendations are then pushed by CRM, whose role is to increase member engagement through education and conversion all along the member lifecycle.

The landscape of explainable models …

To provide these recommendations to new drivers, we decided to build an explainable model which predicts success. That way, we can understand which features are driving a higher success rate and make recommendations based on these. Our assumption that is being A/B tested is that spotting one or two important features will lead to personalized recommendations and that this would have more engagement than generic guidelines.

Example of recommendations our algorithm would produce to the end user.

Looking at the landscape of explainable AI, we identified various methods which can explain and quantify how each feature relates to the predicted outcome. Here are examples:

Building a model which is interpretable out-of-the-box such as linear models or decision trees. That way, we can easily read on the model itself how each feature impacts the success. We did not use this method because we wanted to use more complex models.
For more complex models, the LIME[1] ( Local Interpretable Model-Agnostic Explanations) method states that we can approximate any black-box model to a linear model locally. Hence, for each training sample, we could use this linear approximation to interpret how moving different parameters impact success.
Also for more complex models such as gradient boosted trees, a method called SHAP developed by Lundberg and Lee [2] relies on what is called SHAP Values from game theory. The goal of SHAP values is to find how much each feature contributed to our success probability (more info here [3]). For instance, if we predict that a trip’s success is 80% while our average success rate is 50%, each feature has a SHAP value which explains part of this 30% gap. We can then use these numerical values to analyze our model results and see if one or two features are significantly driving up/down the success score.
Another method that we thought would be interesting would be to use the partial dependence plot of each feature [4]. The idea of these plots is to see the marginal effect of a single variable on the success variable. This assumes that every other feature is independent of the one we look at. This helps see how moving this feature moves on average the success rate. Then, for a single point, we would see where we stand on our partial dependence plot to make a prediction. However, these plots are based on aggregated data. It can therefore only give general recommendations.
The method above is not helpful in giving individualised recommendations. However, in the same spirit, for every data point, we can try to move all features within reasonable bounds, predict success again and see how moving them impacts the score. The one with the higher impact will be the recommended feature. This also assumes that our features are independent of each other. This method is similar to a tool called What-if [5] Tool.

We decided to use this last method to produce individual recommendations as we thought it replicated well what users would do in real life and it was easy to interpret. However, we also produced SHAP values regularly to monitor our model behaviour, to create insights, and to spot inconsistency in our data.

… And how we chose the right one

So let’s recap what we did from the start. First, we built a model to predict drivers’ chances of success. This model for each publication is based on our knowledge, experience, and data. To build it, we’ve trained a gradient boosting algorithm on 2019 data. This dataset gathers features from the drivers’ profiles and the trips’ descriptions. The target tells whether or not the publication experienced success. Gradient boosting is a well-known machine learning algorithm that showed good performance in our case. The area under the curve was around 0.8. Looking at the precision-recall curves, we validated that the model was a good proxy of success.

To build the model with the “what if” methodology in mind, we had to identify three groups of features:

features that help prediction performance,
actionable features at publication time,
actionable features at departure time.

Indeed, if we provided only features which are actionable to the user, we did not have the best model. We, therefore, needed to have these features that help the model perform better. We also needed actionable features that could be potential recommendations. There are also two types of such actionable features. The ones actionable right after the publication, and the ones that will help a future publication. For instance, sending an email saying that you should have published this trip two days earlier would not help this time. However, this information can be shared after departure to help for the next publication.

We noticed two issues. When there is collinearity between features, one feature can hide the true effect of another feature because the model has learned it this way. For instance, we saw that the number of days between signup and publication had a huge impact on success based on our model. However, the feature telling whether or not a member experienced success as a passenger before publication turned out to have very little importance. Looking at the various SHAP values (main effect and interactive values) of these features, we understood the interactions that these features had. Indeed, this “number of days between signup and publication” feature is not actually useful unless it hides that the new driver was already a passenger, with therefore a few ratings boosting their chance of success as drivers. We addressed the issue by grouping features like these two in groups and summing up their SHAP values to understand the impact of this group on success. Because of the additivity of the SHAP values, we knew the impact of these features as a group and did not have to worry about the interactions of these grouped features between each other.

The second problem coming from collinearity is that when we are changing a feature A, any feature that depends on A because of collinearity remains unchanged in our simulation. However, in real life, a member changing feature A would also be changing other collinear features. Hence, it would be changing the success probability differently from our simulation. It was not a problem for us because our actionable features were independent of every other feature, actionable or not.

Conclusion

At the end of the day, we’ve built two separate models, a sort of “what if tool” that pushes recommendations of actions to newbie drivers at publication and departure time and a SHAP-based model that produces aggregated SHAP values to support CRM analysis.

These two models are built upon one simple classification model. We believe that the true value doesn’t lie in the prediction model here but more in its explanation. Understanding a model behaviour is something we always work on at BlaBlaCar. Here, it enabled us to help the members take the most advantage of the platform and maximize their chances of success.

Thanks a lot to Akshita, Célia, Emil and Guendalina for their very important contribution to the project as well as Emmanuel, Nicolas and Emilie for their help in writing the article and Alizée for the illustration.

References:

[1] Marco Tulio Ribeiro and Sameer Singh and Carlos Guestrin. 2016. ““Why Should I Trust You?”: Explaining the Predictions of Any Classifier”.
arXiv:1602.04938 [cs.LG] https://arxiv.org/abs/1602.04938

[2] Scott Lundberg and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions”. arXiv:1705.07874 [cs.AI] https://arxiv.org/abs/1705.07874

[3] Molnar, Christoph. “Interpretable machine learning. A Guide for Making Black Box Models Explainable”, 2019. https://christophm.github.io/interpretable-ml-book/.

[4] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825–2830, 2011. https://scikit-learn.org/stable/modules/partial_dependence.html

[5] https://pair-code.github.io/what-if-tool/

How BlaBlaCar uses Machine Learning to maximise the success of new drivers

The landscape of explainable models …

… And how we chose the right one

Conclusion

References:

Written by Dany Srage