BlaBlaCar: a journey to the cloud

Nicolas Salvy
BlaBlaCar
Published in
6 min readMay 10, 2021

A bit of history

As any Tech company, BlaBlaCar has an IT production team to ensure infrastructure, network, data stores are working at the expected performance, reliability and cost level. At BlaBlaCar this team used to be called “Architect”.

Composed of experts and senior engineers, the Architect team built in-house tooling to support the needs of the engineering team and to run the BlaBlaCar apps on an on-premise infrastructure. Meanwhile, the platform was facing an incredible success story: 3 million members in 2013, 20 million in 2015… 87 million by the beginning of 2020. Under the Head of Architecture responsibility, reporting to the CTO, the Architect team was in charge of all technical choices: hardware, middleware, technology patterns: decisions were made by this team.

Across many examples, this organisation led to the early adoption of container technology (even for datastores), however it’s important to notice that this way of working led as well to some negative impacts on the software development teams. Tooling being decided and built/integrated by the Architect team (Architect team working internally on integration and setup to develop skills before opening support to developers teams) outputs were not always matching developers expectations, creating recurrent frustrations and friction between teams.

This article might give you the flavour of 2015 technical dynamic at BlaBlaCar.

Inside the team, topic assignments were based on personal affinity: everyone was in charge of everything, making ownership at the same time very simple and very hard to achieve. Engineering on-call activities were mainly supported by the Architect team, ending sometimes in overnight PHP code upgrade by non software engineers team. Globally, this organisation remained efficient for many years however expertise-based decision-making became harder as the org grew. Hiring and onboarding people on a custom stack was slow. Supporting business growth became a daily challenge. In parallel, BlaBlaCar’s community growth made it obvious that an architectural evolution (from running a PHP monolith to a service-oriented architecture) was necessary.

At this period, the industry widely adopted container technology and new Cloud providers brought a lot of value to support such architecture (Docker, Kubernetes,…). All conditions for change were met.

Moving to a Cloud infrastructure

In early 2019, we decided to move from our on premise infrastructure to the Google Cloud Platform. Why switch to a cloud provider and why GCP? This might require a dedicated article, yet main objectives were to:

  • Benefit from state of the art infrastructure services supporting the Container architecture we went for a few years ago: Kubernetes being a Google project, GCP was an obvious candidate
  • Get a best in class network to support BlaBlaCar worldwide expansion, specifically in south America
  • Ease infrastructure management: having at the same time a seasonal business and a fast-growing community isn’t so easy
  • Support BlaBlaCar scale on both technical and human resources / skills: using industry standards would ease hiring, competency management and the empowerment of software development teams

Once the decision was made, the “how?” question was to be answered. At this time the architect team reached a size that was inefficient: around 20 people having multidisciplinary backgrounds, running one-man-projects and suffering from many frictions on the decision-making process …

By the end of Q1 2019, it was clear enough that the team had to be split to support the transition. Initially, three sub-teams were identified, with specific missions:

  • Core infrastructure: “Build and run core infrastructure services
  • Database Reliability: “Create and maintain resilient data store services. Supporting teams on choosing the right tools, designing the good data models and operate their services
  • Production Service Reliability (since renamed SRE): “Help teams define and maintain their SLOs by providing Observability, Performance and Reliability support”

This initial set was quickly completed by an Engineering Productivity team, with a mission to “Provide appropriate tooling to engineering teams to achieve productivity in development, staging and production environments.

To support this change, the Architect team was renamed Foundations. A few months later an ad hoc BlaBlaCar Architecture Team (composed of senior engineers from all engineering teams) starts to gather on a regular basis to discuss architecture, design, technology, allowing full transparency on architecture decisions.

Doing this split, we wanted to define clear ownership of technology set and activities, having not only individuals able to answer technical questions (based on previous experience or project allocation) but teams managing a clear technological stack. This would allow teams to define competencies expectations, runbooks, and to be the well-identified “go-to” interlocutors. At the same time, the on-call process changed, having on-call organised on Front, Back and Foundations level, covering separate levels of expertise and cooperating to support 24/7 the platform. This change was performed on the on-prem infrastructure and was not an easy one. The infrastructure suffered from network weakness and was supported by lots of custom developments. It generated much frustration across engineering but was the very first step towards our new organisational setup.

Apart from the on-call, the evolution was not a simple rebranding, it was also revisiting Foundations added value. Instead of developing and supporting a whole stack, the team was now to integrate external provided tooling and support Engineering decisions. Some of the teammates expressed the fear of becoming useless and questioned their personal added value in this new organisation. Without resorting to caricature, some of the Services teams were eager to directly use the Kubernetes API and pick Managed Services to ship (without any constraint) product features whereas the Foundations team was willing to restrict accesses and put a strong governance pattern.

The decision was eventually to co-construct the infrastructure service offer: the Software Factory based on the first migrated component. The path to Google Cloud Platform was finalised by mid-2019. The Engineering team decided to move to the cloud implementing a GitOps approach to empower development teams and ease their ability to ship features in production. This move came along with the “You build it, you run it” mantra. Discussing the balance between the ability to ship versus reliability, BlaBlaCar decided that at this stage of growth being reliable became a little bit more important than being able to deliver fast.

The first production services were started on GCP in Q4–2019, and Foundations will publish all along the year articles to describe the key challenges faced during the migration. At a glance:

The migration is now over: our physical infrastructure has been completely shut down in early 2021 and overall we can say that this journey was a success. The Foundations team found its place, demonstrating a clear added value in this new ecosystem.

This migration was fully transparent to our members and the Product team, accepting an engineering bandwidth reduction during the transition, did not have to freeze its roadmap for the migration to succeed.

From an engineering perspective, even if Kubernetes’ learning curve can be steep, Service teams can now manage their own stack, reducing the friction with the Foundations team. Last but not least, hiring a new resource having a proven track record on the new infrastructure stack became a no-brainer.

We hope that sharing our experience might be beneficial to organisations following the same strategy, stay tuned for the upcoming articles!

--

--