Gamifying Continuous Integration
For the last couple of years we’ve been moving applications from our data centers into the cloud. One of the advantages of moving to the cloud was that we could host our applications in different geographical regions bringing them closer to the user. Also each team working on an application gets the chance to easily have their app hosted on multiple environments (test/stress/integration/production). Having multiple versions of an application running at the same time across multiple regions and environments brought its own challenge of visibility. It was not straight-forward to know what’s running where and when.
To solve this problem we started collecting data (based on events) about everything that happens in terms of our applications getting built, deployed, and deleted. This helped power dashboards that improved the visibility over what’s hosted where. Over months of collecting data, we started realising more usages for this data. One of the other characteristics of our team is that we are a very diverse group of engineers all over the world using a range of technologies. We’re always interested in ways to help spread and share great ways of working. With that in mind, one of the usages of this data we’re trying is a little game we created called Primer Go — “Primer” for the internal name we have for our CI/CD system, and “Go” inspired by Pokemon Go (it was over coffee talking about Pokemon Go which lead to a couple of us to think that we should create a game of some sort around our CI/CD).
Primer Go is a monthly league. The scores get reset at the beginning of the month, and at the end of each month we have the application at the top of the league being declared a winner. The participants in the league are the huge number (over two thousand on last count) of different applications we have running. There are some minimum criteria that an application has to meet to be eligible to be on the league in a month; for example: they should have at least 3 new versions released to production. Each eligible application is scored (on a scale from 1 to 100) in multiple categories, and then the average across all categories is taken. As of this writing, there are 4 scoring categories:
Within the company, we encourage having an open-source model of working. This means that any team can contribute to any project by just building the new feature they want and opening a pull request. For this to be effective, it is important that everyone make their repository more accessible. This means having high quality and up-to-date README file, contribution guidelines and notes about testing. We have a category in Primer Go that scores applications based on the above information being present, and on how recently they were updated.
We encourage people to have a higher number of smaller pull requests, rather than one (or very few) huge pull request. This helps build stuff incrementally, makes it easier to isolate and roll back specific issues, and is also easier to review (increasing the chances of the code review being of a higher quality). This category within the game awards points based on the number of pull requests merged into the application in the month. The number of active developers on that application in the month is also taken into consideration to ensure that team sizes don’t heavily influence the scoring in this category.
How frequently we are able to get new changes out for our users in a main factor that determines how well we are doing Continuous Delivery, and our game has a category to measure just that. We check how many new versions of the application were released to production in the month, and then divide that by the total number of versions (pull requests) merged into the code base. Teams that are closer to 100% in this category are the ones that are releasing each change in isolation into production before further changes are merged on top of it.
This category measures how long it takes from the moment a change is merged into the main branch to the moment that change is released into production. The faster this is, the greater the score. This category makes people think more about their build pipelines (parallelising effectively), testing architecture, and the CI/CD infrastructure itself.
Every month, the league helps us:
- Recognise applications that are doing well. This also means that teams thus identified have a platform to share their best practices across the company.
- Recognise areas of improvement for each application.
- Have some fun :)
We’re constantly tuning and updating the criteria based on feedback from the community and data we’re seeing in the wild. Development metrics are great informers but if you rely on them exclusively or too heavily it is easy to accidentally encourage the wrong behaviors. So far we’ve found this sort of friendly competition is a great way promote how we want to work across a large audience.