Building data-driven engineering organization and culture - Lessons from Booking.com ex-CTO

Published in

Preply Engineering Blog

9 min readJan 3, 2020

Intro

At Preply we always considered Booking.com an example of a super successful marketplace. We were always wondering what their secret sauce is. Luckily, in 2015 Arthur Kosten, co-founder and former CMO of Booking.com invested in Preply. It is an example of smart money on our part, as Arthur understands the gritty-nitty details and specifics of marketplace business. Arthur has introduced us to the Booking team when they were at a hyper-growth stage and we often use their knowledge to build a similar culture.

We are now at a stage of hyper-growth of our engineering team at Preply. Last week we had a pleasure of hosting the former CTO of Booking.com — Brendan Bank, who helps us as an advisor. Brendan joined the Booking team in 2008 initially as an Engineering Director and then a CTO and has been building a truly unique IT culture there until 2017. We always try to share knowledge with our community — as a part of his internal visit, we organized an external meetup at Unit City — for developers interested in a data-driven development culture. We also had a strategic session within Preply. This post is a compilation of the lessons we adopted from both events.

Mindset and values

It’s a part of the Dutch culture to be very direct, yet humble. All of the Booking leaders we’ve met were extremely humble (yeah, we’ve built a ~$90 billion-dollar company, so what 😊). The great developer is the one who understands the limits of their ability and possesses a certain level of self-awareness. The best output of the developer is creating value for the end-customer, not writing a perfect code or playing with cool technology. Developers often try to be perfect, and always right in their decisions so they over-engineer (remember canonical “premature optimization is the root of all evil” ? :)). Product-minded engineers should ask themselves:

Do I want to be right, or I want to be successful?

And when we say successful, we mean developers doing things that help the business grow, even if it is copy-pasting code or doing copy experiments.

Another good example of the mindset is “milking the cows” attitude. Imagine you are working on a farm and have a cow. If you don’t milk it every morning it gets sick. So every morning, no matter how comfortable sleeping in your warm bed is, you wake up, go to the stalls and milk your cows. That’s how your “business” grows. You have to do your everyday tasks that need to be done. Once everything is under control, that’s when you can innovate. And great leaders lead by example here.

We at Preply always look for developers who have an entrepreneurial mindset. To encourage them to think about business side, we at Preply offer everyone in the company ESOP as a part of our compensation package. Our craft can be taught, a person with a bright mind will eventually learn how to code well. But business sense and communication skills are the more intrinsic attributes of a personality. They can also be acquired, but it will take longer.

Speed of innovation: code reviews and tests

One of the books Brendan recommended was “The Goal”. We live in a world of constraints. For example, what do we think about the quality/speed/cost dilemma?

Image: http://www.providentmediagroup.com/blog/pick-any-2/

As a startup, we have attracted venture funding some time ago from our investors — so the cost for us should be the lesser priority than the speed of growth. The quality should be good to make the users happy. Speed rarely can be solved by pouring more resources (e.g. Brook’s Law). A key to optimizing speed lays in removing blockers and giving greater autonomy to the teams and individual contributors.

After a discussion with Brendan we identified that we don’t need code-reviews for when changes are made by codeowners. It is a very counterintuitive move. People consider code-reviews to be the safety nets for architectural flaws and bugs, but in real life, 90% of them are just formal checks: “Yeah, approved”. In our daily work we use Pul Panda as a tool to speed up Code Reviews. 85% of our Pull reviews are reviewed in less than 8 hours:

Can we do better? If we remove the required code-reviews, the success rate will be 100%, and the quality will be lost. But will it? We think that responsible developers should ask for code-review in code they are unsure of, but this should not be a blocker for 90% of the other code they are confident in. We also discovered that for the parts of the code we do want to have mandatory code-reviews (infra configs, payments, etc.). We use Codeowners file with some automation to make code reviews required.

Another controversial topic is covering code with tests. We at Preply have at least 85% coverage of the backend logic with unit tests and a strong culture of Python unit tests (pytest, mock, model-mommy, rich fixtures library).

Booking.com way is to minimize writing tests (perhaps second to none coverage), but give developers Error Budgets to innovate, and solid monitoring for quick rollbacks. Error budgets concept was popularized by Google SRE book (details). The way to think of the tests for us is to understand the opportunity cost:

The cost borne by an organization when it allocates engineering resources to build systems or features that diminish risk instead of features that are directly visible to or usable by end users. These engineers no longer work on new features and products for end users.

Balancing Error Budget and Opportunity Cost requires some practice and strong processes, such as strong monitoring systems. After our talk with Brendan, we decided that e2e tests are the most useful revenue protectors for us. So there is no need to cover every A/B test with vast e2e coverage. Instead, we made a conscious decision that our e2e framework should cover only positive A/B tests to protect the revenue. From the industry benchmarks only around 10% of the tests are successful so it helps us to reduce pressure on developers to cover code that will not be used with tests.

This approach has some limitations. For example, we still plan to have an almost 100% code coverage in our language learning mobile app as rollbacks for the mobile app are extremely complicated.

In the future we plan to embed rollbacks into our A/B testing system to quickly detect the code negatively impacting the business, and make rollbacks automatic.

Organizational design

To make sure that engineering team is capable of achieving the ambitious goals of the business it has to have the right organizational structure. Brendan shared an insight about him spending a significant portion of his time working on organizational design. To do it properly, it’s important to understand the Conwey’s Law and its practical implications:

“If the parts of an organization (e.g., teams, departments, or subdivisions) do not closely reflect the essential parts of the product, or if the relationship between organizations do not reflect the relationships between product parts, then the project will be in trouble … Therefore: Make sure the organization is compatible with the product architecture.” (James O. Coplien and Neil B. Harrison)

We are now at a point when we need to scale our engineering team beyond 50 people, so we, for example, will need to introduce new functions, such as engineering managers. We also made a few decisions on ownership of our product. For example, we have the cross-functional teams which are built around customer journey. Some pages of the website are shared between different teams. We decided that different teams can own different components on the same page, and any team can contribute to a part owned by another team, by consulting with the owners.

A/B testing culture

At Booking.com, experimentation is an important part of the product development cycle. They implement, deploy to production, execute and analyse hundreds of A/B tests on a daily basis to quickly validate ideas. These controlled experiments run across all products, from mobile apps and tools used by accommodation providers to customer service phone lines and internal systems. Such experiments allow for a faster and safer deployment of the new code, turning off individual features quickly — and in some cases even automatically — when needed, and help validate whether our product changes have the expected impact on the user experience.

Every developer at Booking.com has direct impact on the business, by being able to implement and launch their ideas for the entire multi-million audience of the website. The main rule is everything starts under A/B test. Thus, more than a thousand tests are conducted at the same time, often on the same pages. Booking.com is like a big lab where a new user experiment is being launched every hour.

We, do not know what customers want as they don’t exactly know it themselves. The only valid way to understand it is the scientific method (A/B testing) and the data-driven culture. Often, the customers know what they want, but don’t know how to articulate it.

“If I had asked people what they wanted, they would have said faster horses.” (Henry Ford)

Numbers and behavior are a more reliable feedback indicators than words alone. If your customers say they love a product in silver, but they keep buying blue, stick to the blue. Be relentless in studying your data and behaviors to make sure that their words match their actions.

We are relatively good at A/B testing, our internal A/B testing system is powerful enough to detect anomalies, run t-tests and chi-squared tests, plot confidence intervals, etc:

The thing we were missing was on meta-level — we saw the successful results of our individual A/B experiments but were not able to attribute it to company growth. After meeting with Brendan we decided to have 10% of global control group of people which will not be exposed to any B versions of experiments. Basically, these people will see the website frozen in time. No new features, no new bugs, but also no incremental improvements from A/B experiments.

Technical debt

One of the most “heated” discussions was the one on technical debt. At some point Booking.com banned rewriting the code to “make it better”. Valid reasons for refactoring should be:

a) an increase of the load on the system that the previous code would be unable to cope with

b) a change in the business priorities/model.

Thanks to this, the company can spend 99% of developers’ resources on writing new functionality, which allows the business grow faster than its competitors.

For developers it is natural to love the beauty of a “good code”. But does it help the business grow faster? If our code is imperfect but used by millions it is much better than a perfect code used by no one.

Another idea is that it is almost impossible to measure the impact of refactoring in terms of money. Statements like “now we can iterate faster” are, unfortunately, very subjective. On the other hand, the impact of new features that were A/B tested, and bring money to the business is obvious. As an engineering organization, we are constantly looking for that balance between delivering measurable business value and maintaining “subjective” code quality. It is hard, but rewarding.

Thank you Brendan Bank for all these lessons)

“The best way forward for us has been organized chaos. Data driven decision making and data driven product development are the keys to our success”(Brendan Bank)