For BlaBlaCar, being a data-driven company means there is a continuous two-way exchange between data and strategy. In the following article, I will share with you how our ability to perform analysis based on geography grew alongside our business needs.
Building the right tool at the right time
For a long time, our strategy focused on fueling our marketplace with enough members to create a momentum. We were all about member acquisition and that translated into our Data Analytics & Data Science team being a pure Marketing Intelligence team.
Only when we reached a critical mass and then geared our strategy towards member retention did we start investing in Marketplace analytics.
The link between the two was that members came to BlaBlaCar to travel together and would only come back if they were successful. Our new motto from then on was to refine our matching of passengers and drivers.
One key element for matching members is that they should want to travel the same road. Mind you, at that time, our data tools only allowed to report on activity with a geographical segmentation at the level of countries. Our reports did not go beyond and could not describe how activity was structured inside a particular country.
The only way we could dive into the level of cities was via a back-office tool that reproduced search: we entered two cities, one date and it returned a number of metrics based on the search results.
From this first tool, we learnt that some axes (defined by a set of two cities) were already very liquid, with most cars being filled with passengers, while other axes only saw lonely drivers. Of course, we had to scale this insight into a report that everyone could regularly refer to.
It was at this time that we implemented the concept of cities and axes in the data warehouse. Hosting a database of cities revealed to be a difficult task. As we were at the beginning of the data warehouse construction, we could not use complex geographical objects for city representation. Not only did we lack the space to store them, but mapping trips to these objects would also have taken too long.
Fortunately, geohash* was a good solution.We modelled each city as the set of geohashes covering a round area defined by a center and a fixed radius. To map trips to cities, we geohashed departure and arrival points, and matched them to our “round cities”, using string joins.
Elaborating new strategies
This very hand-scraped solution already brought a lot of excitement in the company and new interactions with the data.
We saw that drivers new to BlaBlaCar were more likely to publish on unpopular axes, which led us to design new publication flows that highlighted the importance of meeting points and stopovers.
Better geographical analysis also strengthened our Competition Intelligence and confirmed that alternative long-distance transport were becoming fiercer on some of our top axes. We reacted by modifying our communication on these axes in order to rebalance the marketplace.
From circles to polygons
This new field of operations led us in turn to upgrade our data tool for geography. As we only wanted to operate on trips which faced intense competition, we had to move away from modelling cities with circles.
As we could still not handle complex objects in our database, we built a back-office tool which allowed us to draw polygons on a map. This way, geographies were accurate enough to be used in production, but still simple enough to be handled by our analytics platform.
Opening the door to statistical experiments and beyond
Modifying communication on some of our top axes was the very start of experiments, which paved the road for all other experiments and led us to develop our skills in statistics and A/B testing. We even became very creative about it, designing our own way of setting experiments in a marketplace, customised to limit interferences. For more information, see A/B testing in a long-distance carpooling marketplace by my colleague Julien!
Going finer into marketplace dynamics, we also found dependencies between axes that we had not thought of yet. For example, let’s take 4 cities along a main road that we will call A, B, C and D. Can you imagine that adding cars on the B-C portion of the road could actually increase the income of drivers who drive between A and D?**
All in all, introducing geography in our analysis helped us mature our understanding and vision… so much so that we were pushed to upgrade our tools to the next level.
Reaching a doorstep accuracy
Updating our product vision and strategy meant digging in our product-market fit and having a better understanding of the use cases where carpooling is the best solution.
As we believe travel needs are very much linked to where you are and where you want to go, and that carpooling is always a choice you make relatively to other modes of transportation, we worked on segmenting our activity in a way that reflected these three parameters.
We thus came up with a tool that classified cities depending on the role they play in the national transportation network. The result for France is illustrated on this map:
This analytical tool allowed us to compute the market split by type of travel needs.
Finally, we would like to be more accurate, not only in locating departure and arrival points, but also all along the road. We heard many user stories where people realised only in the car that they could have met somewhere more convenient for them, had they known where their fellow travellers actually departed from.
Bringing transport connections closer to your doorstep.
The same goes for stopover recommendation****. Why suggest Béatrice to stop near Rennes to pick-up David when she actually takes the road passing by Caen?
These are only two use cases that could be solved by integrating roads in our geographical referential, and we have identified many, which is why integrating a higher accuracy is our next obsession and we will be in the look up for the best tools to do it.
Don’t hesitate to contact us to share your experience or exchange about this topic!
(*) Geohash is a public domain encoding system which encodes a geographic location into a short string of letters and digits. One interesting property is that the longer the string of characters, the higher the precision of the location. Truncation starts by the end of the string, so that if two geohashes share the same prefix, they belong to the same region.
(**): If so, leave a comment with your guess about what could explain this phenomenon.
(***): Vertica Place is an add-on to Vertica that follows the Open Geospatial Consortium standards (just like PostGis). It provides functions to manipulate 2- and 3-D spatial objects and to compute intersections, distances and other very useful operations.
(****) Adding a stopover on a trip increases the chances of picking-up passengers but drivers new to BlaBlaCar often do not know which stopovers to select, and select none. One product feature therefore aims at matching passengers and drivers regardless of declared stopovers.