Turbolift — a tool for refactoring at scale
Skyscanner’s systems are anything but small-scale. With millions of travellers using our site and app every month, we handle dizzying volumes of requests across a microservice architecture that, itself, is pretty huge. All-in, there are several hundred microservices and microsites (webapps that support a specific portion of the site), supported by hundreds more lambdas and internal libraries. Each is in its own GitHub repository, which has some upsides in terms of separation of concerns, but clearly has some costs: when the same change needs to be made to all of these repositories, how can it be done? Most of our microservices use common shared libraries, so updating to receive a new security patch, resilience improvement or observability feature (for example) is often just a Dependabot bump away.
However, not every change we want to make is in a library — despite our best efforts, we still have boilerplate config/code that needs to be improved from time to time. And while we’re taming our repository count where we can (including combining repositories where it makes sense), we still have a lot of repositories.
We needed to be able to make non-trivial changes in tens or hundreds of repositories at a time.
For a long time we nurtured an in-house system named Codelift: essentially a batch system which would apply a Python change script against hundreds of repositories overnight, raising Pull Requests (PRs) with any changes. However, it’s pretty difficult to write a script that will reliably work across diverse repositories; just the expertise needed to review the change scripts was a bottleneck, and scripts frequently needed multiple rounds of tweaking to deal with the inevitable failures. Codelift gradually fell out of use, but the need for it remained.
Turbolift is a re-imagining of the mass change process:
- Previously, to write a reliable Codelift change script, engineers would frequently have to clone many or all of the affected repos locally, just to test that the change would work. If engineers are going to locally clone repos anyway, why not just make this part of the process?
- Writing change scripts in Python was constraining: sometimes the easiest way to express a change is just a simple shell command or invocation of a more specialised refacotring tool like codemod or comby. Sometimes, firing up an editor or IDE is a heavyweight but easy option. And sometimes the easiest thing to do is an automated change that works for 95% of repositories, hand-tweaked in the few repos which need it.
- Having change scripts at all is only really useful if you plan to do the same mass-refactoring activity again. But in many cases we can be confident that changes will be fixed once. Keeping a record of what we did is important, but it doesn’t have to be in the form of a reusable script.
- One subtle issue with Codelift was that all of its PRs came from a bot user: this created a social expectation upon the owners of the Codelift system to thoroughly vet every change, and became a major bottleneck. We realised thathaving PRs be raised by the engineer who is actually responsible for them would be better: clearer ownership, easier feedback, and no need for a gatekeeper team.
Turbolift ‘automates the boring parts’ — mass fork, clone, commit and creation of PRs, while being completely unopinionated about how the actual changes are carried out. Engineers can directly inspect, edit and test their changes using whichever tools they want, which is far more ‘tactile’ than throwing a script into a batch system and waiting for the results.
Cloning repositories to developers’ machines has some clear trade-offs — it takes some wall-clock time and disk space. But the reduced cognitive load on engineers makes this trade-off well worth it, in our opinion.
Turbolift started life as a hastily written set of bash scripts, but quickly proved its value for us. We’ve now rewritten it in Go, tidied up and open sourced the tool, and we’d love to share it with you: https://github.com/Skyscanner/turbolift. Compared to the original bash version, Go helps us make the tool more user friendly and maintainable over the long term. We have plenty of ideas for how the tool can evolve to be even better, and we’d welcome external contributions to help it improve.
If you try Turbolift, we’d advise being sensitive to PR reviewers’ needs, particularly if you’re creating a lot of PRs. The README file for the project includes a few guidelines that we’ve developed internally to help change authors stay good citizens.
How Turbolift has helped us
- In the run up to an internal SSL certificate expiry, our production platform team used Turbolift to raise PRs against hundreds of repositories that referenced the expiring certificate.
- Our web enablement team has been using Turbolift to standardise versions and testing of libraries throughout our microsites.
- Our production platform team used Turbolift to fix a bug which once appeared in a code template, and which had since made its way into many repositories.
- Squads have been able to clean up and update repository metadata files that track ownership and other information. Updating these files had previously been a chore, but necessary as squads renamed or ownership of repositories changed.
In total over the last 3 months, we’ve raised over 1200 internal PRs using Turbolift. Each of these represents a problem averted or technical debt cleaned up, and would have otherwise been a hand-created PR. As a result, bit by bit, we hope that engineers at Skyscanner and elsewhere will benefit from an easier workflow when making changes at scale.
From flights to hotels and car hire, Skyscanner works side-by-side with the biggest names in travel to bring over 100 million users all the options they need to plan and book their perfect trip.
We’re already a market leader and we’re just getting started. Next stop: Leading the global transformation to modern and sustainable travel.
Join us for the adventure of a lifetime. Together we can change how the world travels.
About the author
Richard North is a principal software engineer in Skyscanner’s Production Platform Tribe. The tribe builds and operates a range of the systems that support Skyscanner’s product engineering, including large scale Kubernetes clusters, core AWS/web infrastructure, operational monitoring, core libraries, CI/CD and developer tools. We aim to provide a powerful and resilient base that enables other engineers to focus on delivering awesome features to our customers: travellers.