Platform Engineering

A Tale of Moving 4000 GitHub repositories to GitHub Actions

The behind-the-scenes challenges (and solutions) of implementing GitHub Actions in a large engineering department

Lou Bichard
DAZN Engineering

--

GitHub Actions is a new(ish) player in the Continuous Integration(CI) market, and many companies will be looking to analyse its capabilities.

At DAZN, we’ve just finished the first few phases of our large-scale migration of 4000 code repositories to GitHub Actions.

Today we want to take you through that process, and share with you some of the challenges that we faced.

To start, let’s begin with some context…

DAZN has around 400 engineers and 4000 private repositories on GitHub.

Our Continuous Integration at DAZN is managed centrally by our Platform team, which consists of three teams: Site Reliability, Cloud Engineering & Developer Experience (who I work for!).

For our Continuous Integration in DAZN engineering, we previously self-hosted the open-source Continuous Integration tool, Drone (version 0.8).

On April 17th of 2019, Drone announced the release of a 1.0 version update.

But along with that version update came some challenges for us.

What challenges am I talking about?

With the upgrade to Drone 1.0, Drone decided to move to a new pricing model, where an enterprise license was required to run more than one instance.

On a normal workday, logging into our AWS account, I can see we’re running 31 instances of t3.2xlarge EC2 for Drone.

To use a single EC2 instance wouldn’t come close to meeting our workload needs—we’d definitely need an enterprise license if we were to continue using Drone on the 1.0 release. So, with self-hosting Drone no longer a viable option for us, this situation opened up a world of opportunity for us to assess our Continuous Integration needs and potential options.

Choosing The Future Of Our Continuous Integration

Where we can, within Developer Experience, we like to seek input from our engineers, especially on decisions as impactful as our Continuous Integration tooling. So we took the opportunity to ask our engineers their take on what was important to them in a future Continuous Integration tool.

From surveying 143 engineers, they told us:

  • Conditional build steps
  • Parameterised builds
  • Pipeline inheritance templating
  • And fan-in build pipelines

Were amongst the features they felt were missing from Drone 0.8.

The Questionnaire gave us insights into sentiment on topics like CI performance.

And of course, in the move to a new Continuous Integration, the Platform team (and within Developer Experience) we had some preferences, too.

For instance, we wanted any future CI tool to work in a technically similar way to Drone, to ease developer pains in upskilling and moving plugins and functionality to the new tool.

Implementing short-term pain-relief for Drone 0.8

Given that we knew performing such a large-scale migration would take time, we decided to also spend some energy on implementing some quick “pain-relief” for our engineers whilst they waited for a decision.

In the meantime, on top of our existing Drone 0.8 architecture, the Developer Experience team built a customized user interface that we named “UFO.”

UFO is based on an existing library called “woodpecker” maintained by Laszlo Fogas, which allowed us to mitigate some performance issues with Drone 0.8, and allowed us to extend our existing CI with some DAZN-specific features, such as audible build notifications, absolute timestamps on build output, custom error help messages, and (naturally!) dark mode.

An example error suggestion feature within UFO: giving quick answers to common errors.

Enter: GitHub Actions

After several months of investigations into different Continuous Integration, the launch of GitHub Actions caught our attention.

Github Actions would give us the ability to use containers for build steps, just as we did with Drone, and a fairly simple migration path, whilst also meeting lots of the needs that our engineers identified in our original survey.

Since there are already many great articles covering how to use GitHub Actions effectively, we thought it to be more interesting to talk through some of the challenges we faced as a larger organization moving to Github Actions.

We’ll talk about how we’re loading secrets into our pipelines to give repositories necessary access to AWS, how we’re managing billing for so many repositories with custom alarming, and finally how we’re helping engineers to migrate their pipelines, through the use of a custom migration CLI tool.

Let’s start with the main challenge, managing secrets.

Challenge 1: Managing Secrets & Access To AWS Via Github Actions

Any software that does anything useful will need access to resources

Typically that means managing some form of access credential, or secret.

Whilst GitHub Actions does have some features for organization secrets, we needed a way to grant each of our repositories access to the different software systems that teams need to interact with.

As with most security, the repeatability and traceability of automation is our preferred approach. So we needed to build some custom automated tools to grant our engineers sufficient access for their Github Actions.

How did we do it?

Our main hosting solution is AWS, so granting AWS access was very high on our priority list for the Github Actions migration. To allow a repository to get access to AWS via Github Actions, we create a single AWS IAM user for each of our GitHub repositories. Each user has credentials attached, which are then pushed to GitHub and stored as secrets, accessible to pipelines.

Creating IAM users and keys is done via an AWS Lambda that is triggered for each new repository registered for GitHub Actions. We keep multiple active keys in use at any given time and swap out the one inactive key periodically.

The AWS architecture responsible for granting AWS access to repo’s within Github Actions.

Using one IAM per-repo gives us the ability to apply granular permissions for each of our repos (principle of least privilege), and gives us traceability over what each of the different Continuous Integration secrets are accessing.

Challenge 2: Gaining Observability Over Our Billing Data.

At DAZN we follow the principle of “you build it, you run it” (as originally coined by Werner Vogels of AWS) by giving teams lots of freedom to build services in ways that suit them. But freedom must also come with guardrails, especially when it comes to managing costs.

Ideally, we want to give our teams a view of their individual spending on Continuous Integration. As GitHub Actions doesn’t yet provide ring-fencing of costs on a per team or per repo basis, visibility for spending isn’t just useful for keeping costs down but preventing over-use from impacting other teams.

Given that we cannot add cost isolation for our teams, we did need some way to get better usage reporting from GitHub actions so that increasing costs are acknowledged, but currently, the only way to get billing information out of GitHub Actions is via a CSV emailed to you—not ideal for visibility.

Github Actions usage report download

But luckily, since we had a seasoned Site Reliability Engineer (SRE) on our GitHub Actions implementation team, we decided to approach our billing challenges with Github Actions like we were monitoring a production service.

One of the metrics we now track is “remaining usage per day”. As each day passes towards the monthly billing reset, we monitor the trend. This metric tells us whether our usage is accumulating (meaning that we’re under budget), or where our usage is declining (meaning we’re over budget).

We implemented a service that pushes GitHub Actions metrics to CloudWatch, to enable us to setup alarms to give us an early warning for any over-usage. When we receive an alarm, we have a runbook and some custom scripts that we can run to give us finer-grained data on our costings.

Challenge 3: A Very Developer Experience Problem, Incentivising Teams To Migrate.

The final challenge to overcome in our migration to GitHub Actions was a very pure Developer Experience challenge: how do we incentivise teams to make the switch? How do we make the process as pain-free as possible?

And whilst we did have early adopters who were keen to migrate, we know that we’ll have a “long-tail” of engineers who already have their Continuous Integration setup using Drone, who are then going to be reluctant to move.

What’s the fix? To help engineers make the switch, via automation.

Our custom Drone to Github Actions migrations CLI

To make migration simpler for teams, we created a custom command-line utility that swaps a Drone configuration file into a GitHub Actions workflow file. How does it work? By reading the Drone YAML file and converting each step into the corresponding GitHub Actions format.

And whilst the Drone to Github Actions conversion CLI tool isn’t a silver bullet because it can’t convert 100% of every pipeline— the tool is at least helping to give teams a much-needed head-start with each migration.

The Future of GitHub Actions In DAZN

And on that note, that concludes our run-through of the main challenges that we’ve faced at the beginning of our journey to migrate all our repositories to Github Actions.

Our current target completion for the migration is March 2022. As part of the celebration of the move, we could put up another blog with our learnings and battle-scars across all of our different engineering teams, let’s see!

As with all work like this, it’s always a big team effort. So I wanted to give a dedicated thank you, firstly to those who joined and contributed to the Github Actions working group, especially Adrian Thomas for driving conversations, also to the Platform working group for their input, Jesse Hitch, Georgi Barzev, Rick Burgess and Yogesh Lonkar who worked hard in creating the AWS infrastructure behind Github Actions, and also the Developer Experience team including David Rubio Vidal, Naomi Gaynor for pushing and supporting it, and everyone else who helped us to test out the new tooling and create documentation to support the teams in migrating!

--

--

Lou Bichard
DAZN Engineering

Teaching the next generation of Cloud Native Software Engineers @ thedevcoach.co.uk.