How BlaBlaCar leverages machine learning to match passengers and drivers - Part 2

The story of how Smart Selection evolved from a mere concept into an ML product that adds value to BlaBlaCar.

Fabio Cecchi
BlaBlaCar
8 min readOct 26, 2023

--

Quick recap of episode 1

In our previous article “How BlaBlaCar matches passengers and drivers with machine learning” we introduced Smart Selection, the ML powered algorithm that we use at BlaBlaCar to assess whether a Boost ride should be displayed or not.

As a quick summary, Smart Selection aims to predict whether a driver would accept a booking request, and to hide such a ride if the predicted score is lower than a threshold. As a consequence, we improved the user experience of both passengers, whose requests are rejected less often, and of drivers, who receive more appealing requests. Smart Selection is thus a great tool to control our inventory and ensure that only high quality rides are presented to passengers.

Illustration 1: Smart Selection aims to find the right balance between showing more or fewer results. In the first case we favour the passengers who will often find a match for their needs, while in the latter we favour the drivers making sure they receive only more appealing requests.

In the previous article we discussed how Smart Selection works, focusing on the Data Science details.

In this article, we deep dive into the journey Smart Selection. It started as a fuzzy idea. It then became refined to a serious proof of concept. It finally graduated into a tool deployed in the app that has been generating value for BlaBlaCar for over a year.

Some of the questions we tackle in this follow-up article are:

  • How did we convince our board that hiding rides could bring benefits to BlaBlaCar’s business?
  • How do we make sure that the production model is up-to-date and thus taking the right decision? And if it is not, how do we react?
  • How do we keep improving Smart Selection by reporting results to our board and bringing their inputs to life?

As a bonus, we will also hint at a teaser for part 3 (the last one?) of this series of articles: “How we are making Smart Selection more reliable and automated through MLOps.”

Assessing the impact through A/B testing

What Smart Selection is doing in practice is reducing the options for the passengers to find a ride. That is a counterintuitive idea for a business that relies on matching as many drivers and passengers as possible.

Our theory was that the missed bookings due to hidden rides would have been compensated by redirecting passengers to requesting other, better, rides. In this way we would not actually lose their bookings, but make it more likely to happen due to the increased chance of their requests being accepted.

That’s an intricate reasoning, based on a lot of assumptions. To convince our board to allocate time to this project we clearly needed something more robust than that…

For this reason, we designed an A/B test protocol featuring a ML model, whose idea was to validate our theory without actually spending too much time making it perfect. The ML model was fairly simple to train given the data at our disposal, and we validated its better performance against deterministic business rules. This model was far from perfect or robust, but it allowed us to quickly assess whether we could generate value by investing in Smart Selection.

The A/B test setup we designed was fairly straightforward. We split the potential passengers into two balanced groups and applied Smart Selection to the treatment group while keeping the control group untouched. In practice, we hid some results to the passengers in the treatment group while allowing such results to be displayed to passengers in the control group.

Side note: For the sake of this story, it is sufficient to know that Boost rides are artificial rides BlaBlaCar creates to increase the chances of drivers to find passengers along their route. These are the rides that we target with Smart Selection and we hide in case their driver acceptance score is low. To get more details, please give a look at Part 1.

Illustration 2: Simple setup of the first A/B test we run in France

We thoroughly estimated the amount of gross margin we would put at risk with an A/B test. Once we validated that we would not endanger more than 2% of the GM over a month we got our hand free. We launched a test in France for a week, and surprise surprise! Hiding the 20% of the Boost rides with the lowest score increased the drivers’ global acceptance rate (the ratio between requests accepted by drivers over the requests they received) in the treatment group. On top of that, we also observed an increased number of seats booked in the treatment group.

After this initial successful test, the KPI of the project changed from improving the acceptance rate to increasing the number of booked seats. Selling a conservative goal and using it to test more ambitious goals was the approach we chose. The risk taken paid out, the board was ready to allocate more resources to enhance Smart Selection, mission accomplished!

From A/B testing to A/B/C continuous probing

The first A/B test was fundamental, but now we had to face many new problems related to operating Smart Selection in production at scale. Let me mention a couple:

  1. With Smart Selection deployed we would no longer observe the drivers’ reaction to what would be less appealing ride requests. How could we be sure that the model in production remained accurate? How could we retrain a model without recent data on bad rides?
  2. An A/B test only provides us with a data point where two alternatives are compared at that specific moment in time (ex: In France it was better to hide 20% of Boost rides than nothing in early October 2022). Namely, we got a static data point, but how can we know that 20% is the optimal percentage of results to hide? And how can we be sure that the right percentage now will still be optimal next month as well?

To answer both problems, we designed a deployment process consisting of a continuous A/B/C probing whose idea was to constantly split passengers into groups that are treated in different ways. The performance of the different groups are constantly evaluated and, if needed, we update the passengers’ allocation to the different groups.

Let’s be more precise:

  • At every moment in time, 10% of the passengers are not subject to Smart Selection. This group of visitors is there for two reasons: to benchmark the performance of Smart Selection and to retrain the Smart Selection model when deemed necessary. Thus this continuous training group helps us solve the first problem mentioned above.
  • The other 90% of passengers are equally split into 3 groups, for which we hide different percentages of boost rides (ex: the 20% boost rides with the lowest scores, the 30%, and the 40%). At evaluation time we observe the performance of these groups and if needed shift the allocation of passengers (see Illustration 3). This helps us mitigate the issue of the second problem above and react to the shifts in our marketplace.
Illustration 3: Visual representation of the continuous probe mechanism. The allocation of visitors at time T+1 depends only on the performances of the groups at time T

The takeaway is that we traded off accuracy (deploy the optimal version of Smart Selection right now) for quickness to react to the shifts of our ever changing marketplace (to get an idea, over 2023 in France we changed the allocation of visitors to hiding buckets 6 times). And we are fine with that! Note that the volumes that we observe in this A/B/C probing setup are not sufficient to provide statistically significant results at every assessment, but we are fine with taking actions that are wrong from time to time as long as the general trend pushes us in the right direction.

Monitoring and reporting

The quality of what Smart Selection brings to the table could not be appreciated by our management if not through the excellent reporting dashboards generated by Jennifer Baleon that are updated daily in Tableau.

Through those dashboards, the other teams at BlaBlaCar can monitor what we are doing and may ask questions about what is going on or suggest improvements to make the Machine Learning model better.

Illustration 4: Positive loop between the dashboards reporting Smart Selection performance, the analysts and the stakeholders looking at the dashboards and providing inputs to the data team, and finally the improvement in the model we can generate through data science and business inputs.

As an example, our business stakeholders realised that Smart Selection had an undesired impact on drivers publishing their main ride on non-popular axes. In general, such drivers highly rely on boost requests to pick-up and drop-off passengers along the way instead of finding them at their exact origin or destination. This reasonable intuition could be backed up with data. We then engineered the right features to feed to the model so as to calibrate the driver acceptance scores in a more adequate way. That was a validation point for the processes we put in place to expose the right data to our stakeholders and to take into account their feedback. This allowed to steer the behavior of Smart Selection towards a direction more aligned with the company’s vision.

Takeaways and next steps

The main takeaway from this story is that the path for transforming an idea into a ML product running in production is long and full of challenges. Many of the obstacles are not foreseeable at an early stage of the project, others can be thought ahead. However, starting small and fostering the confidence of your stakeholders step by step is a healthy way to move forward all together and it has proved successful so far for us.

Is this the last chapter for Smart Selection? Fear not, for its evolution is not done. Nowadays, Smart Selection is becoming more robust and automated, and, ideally, it will self-adjust over time as the developing team moves to new challenging projects. In order to achieve such an automation, we embraced the principles of MLOps. We transformed the complex architecture that made Smart Selection a success into a shiny new one that should allow its future to be at least as bright.

So, stay tuned as Riwan Perron is going to tell you more about this and the challenges that such a transformation entails in the next episode. Till then I wish you all “Happy carpooling!”

Many thanks to Thibault Ambard, Raphael Berly, Jeremy Gonzva, Riwan Perron, Jennifer Baleon, Maria Clara Bezzecchi, Justin Pyron, Thomas Pocreau, and Victor Rubin for the help in writing this article.

--

--