How BlaBlaCar leverages machine learning to match passengers and drivers

The story of how we smartly select search results to improve user experience at BlaBlaCar

Published in

BlaBlaCar

6 min readMar 1, 2023

What is BlaBlaCar ?

BlaBlaCar provides a platform facilitating carpooling in 22 countries worldwide. It allows drivers to publish their trips and fill the empty seats in their cars, sharing the cost of the trip while reducing the emissions of CO2.

Our AI plays a game of hide-and-seek with our passengers and drivers. It hides rides that drivers prefer not to show, while seeking the best ones for passengers. Smartly playing that hide-and-seek game, it increases the number of matches between passengers and drivers.

At BlaBlaCar, our users can play two roles:

The drivers publish their trips and aim to share the road with potential passengers.
The passengers search for rides and are offered trips matching their search criteria.

Illustration 1: Our goal is to match drivers and passengers, thus reducing the vehicles on the road and the number of empty seats

When passengers are interested in a trip, they send a request. On average, 70% of the drivers manually review the passengers’ booking requests and they decide whether to refuse it or (hopefully) accept it.

One of our goals is to improve the quality of the matching suggested to the passengers. We need passengers to find the most relevant rides and drivers to accept the requests they receive. Better matching alleviates frustrations, as drivers get annoyed when receiving requests they do not find worthy and passengers do not appreciate the rejection.

Damaging the user experience and undermining their retention is dangerous for our business. BlaBlaCar generates a complex marketplace that thrives when both drivers and passengers are successful and regularly return to the platform.

Introducing Boost rides and their impact on the marketplace

Drivers coming to our platform publish trips that are as convenient as possible for them, often selecting only their origin and arrival location. To increase their chance of finding passengers, we developed what we call “Boost rides”: we create potential rides that would match the passengers’ requirements at the cost of adding short detours to what the driver published.

As a concrete example, a driver going from Paris to Lyon could receive a “Boost request” from a passenger also going to Lyon but needing to be picked up in Auxerre.

Boost rides allow us to find more potential matches for both drivers and passengers, thus increasing their chances of successfully carpooling.

Boost rides play a fundamental role in our marketplace:

In France they amount to roughly 45% of the results displayed and they generate 30% of BlaBlaCar bookings.
They allow BlaBlaCar to reach users otherwise unreachable. It enables passengers looking for rides that are not explicitly popular, but that intersect more popular ones.
They increase the likelihood of success for drivers publishing rides that are not popular but that intersect more popular ones to find passengers.

This feature enables passengers to find rides from any countryside town to Paris for instance!

Illustration 2: Boost rides are fundamental for passengers looking for non popular rides.
How would this BlaBlaPassenger find a ride otherwise?

However, all that glitters is not gold: Boost requests are accepted 50% of the time by the drivers. This value is low compared to the 80% of acceptance rate for other requests. This drop is expected and perfectly understandable, as sometimes drivers are on a tight schedule and detours might not be convenient for them.

Not all the Boost requests are the same and not all the drivers react to them in the same way. There are very appealing Boost requests and drivers inclined to take detours as well as inconvenient requests and drivers never diverting from their route.

Now, imagine what we could do if we could predict whether a driver would accept a Boost request or not: we could then remove the superfluous Boost trips from the search results, thus vastly improving the user experience (both for drivers and passengers). Of course, we do not know what a driver would do upon receiving a specific Boost request. We can however use machine learning and train a model to predict drivers’ behaviour, using that forecast as a proxy for the information we do not have!

Machine learning to predict driver behaviour

Forecasting a driver’s response is a perfect task for machine learning as it can be modeled as a binary classification problem on a dataset generated by the past Boost requests that were sent. The target of our model would simply be the driver response: the accepted or refused request. We can leverage years of historical data to understand what are the chances that a Boost ride has to be accepted.

We trained a model on past Boost requests to predict whether the request would be accepted or not. This is a XGBoost model, taking in input temporal and geographical features as well as the drivers’ past behaviour. Note that this model has to run live in a production environment and to provide scores instantaneously whenever a potential passenger makes a search.

Although our model cannot read drivers’ minds, it provided interesting insights in agreement with common sense:

Drivers are less willing to accept Boost requests as the requested ride gets shorter
Drivers are less willing to accept Boost requests as the departure time approaches
Drivers that already accepted Boost requests are more likely to do it again

Don’t show the Boost rides that a driver wouldn’t accept

Now that we have the drivers acceptance scores, how can we decide which results should be hidden and which should be displayed?

Product and business set the threshold to either show more or fewer results. More results mean more chances for drivers to find passengers but more likely for passengers to be rejected, fewer results mean the opposite.

Illustration 3: It is challenging to find the right balance between showing more or fewer results. In the first case we favor the passengers who will often find a match for their needs, while in the latter we favor the drivers making sure they receive only good requests.

An important observation is that hiding results does not always translate to missing opportunities. In fact, passengers whose ideal offer is hidden, might still find rides matching their criteria and end up requesting a ride with a higher probability of being accepted. Basically, we might slightly reduce the number of requests sent out, but the better quality of requests received by the drivers compensate for that.

Finding the sweet spot between not hiding enough and hiding too much is key to success here. The threshold needs to adapt to how “redundant” Boost rides are and how many of them can thus be safely hidden without impacting the volume of the marketplace. A challenging task, but extremely valuable!

Results and takeaways

The methodology suggested has been rolled out in France in September 2022, and in 14 more countries since then. We observed positive results everywhere. The acceptance rate of Boost requests has increased by roughly 30% and the overall number of passengers increased!

Here are a few final takeaways:

Quality over quantity for our inventory is important. Hiding results is bringing benefits to both the user experience and the amount of bookings we realise. If there is already a great ride available, why should we create a new one less likely to be accepted?
Boost rides are an extremely valuable product but it is fundamental to offer them to our users at the right time and in the right way. In this article we discussed how to decide whether to display them or not, but many other challenges have to be faced: How to price them? How to select the meeting points?
BlaBlaCar is a self-regulating marketplace, still we can play a role in understanding and influencing it. Yet, every action taken might generate a cascade of effects on both drivers’ and passengers’ experiences, both in the short and the long term. This makes dealing with these topics stimulating and always interesting.

I have not touched upon many of the challenges relative to deploying and maintaining a machine learning model like this in production: from data collection to retraining, from monitoring data drifts to running A/B tests…. You will have to wait for the next episode, and until then “Happy carpooling!”

Many thanks to Emmanuel Martin-Chave, Thibault Ambard, Raphael Berly, Riwan Perron, Sandi Endo, Thomas Pocreau, Benoit Rajalu, and Victor Rubin for the help in writing this article.