Data Driven Inner Source
By Jakub Kadlubiec and Giovanni Perna
About a year ago at Skyscanner we started promoting the inner source model. At the time several teams within the company were already working in an internal open source way by contributing to codebases owned by different teams. The general feeling was pretty positive, but we were aware that the adoption of this model was adding additional workload on some teams or areas of the business. Our initial goal was to identify those areas.
We started by asking our colleagues how this worked for them, but we soon realised that it was still difficult to get the whole picture. Skyscanner is a data-driven company and we thought that, once again data could have helped us, not just in getting that early feedback but also in monitoring the adoption of the inner source model over time.
The obvious question at this point was: Where to get this data from? There was an obvious answer too: GitLab!
In the first brainstorming session we identified a number of views on the data that could have given us a better understanding of how teams were coping with the adoption of the inner source model:
- Count of open, external (to the team) Merge Requests (MRs) per project.
- Average duration that an external MR remains open per project.
- Average number of comments per external MR per project.
We also realised that in Skyscanner we already had a service that was gathering and analysing GitLab data for a different purpose, and we thought that it could have been extended to cover this other use case. Next section contains an overview of the service and how it was later adapted to monitor inner source activity.
Gathering the data
Due to the amount of development activity happening in the company we needed to gather the inner source metrics automatically. Before we explain how we implemented the data collection solution, it is necessary to understand how Skyscanner teams structure their codebases.
Background: repository-per-service model
We are promoting a repository-per-service model at Skyscanner. We chose this over mono-repo for two main reasons:
- it’s flexible — teams are not constrained by others
- no need for special infrastructure to handle partial clones, partial deployment, etc.
At the moment we have over 10k repositories in the company and it is difficult to know what all those repositories contain. We don’t know what are those projects, whether they are even used (hint: most of them aren’t), what technologies are they built in, whether they follow the company engineering standards, etc..
We built Surveyor to solve this problem for us. The service scans all GitLab repositories every night and collects data about projects. We are tracking almost 90 attributes for each project.
The architecture of Surveyor is simple: it is a Python service deployed to ECS on AWS. Each night it initiates a scan and uses GitLab API to get info about projects. It then retrieves some of the important files projects have (e.g. build.gradle, requirements.txt, package.json) again using the API. It doesn’t need to clone the repository locally. The files are then parsed and important information is stored in the PostgreSQL database running in RDS. The historical data is preserved allowing us to see how things changed in the past. For example the following graph shows our progress over time with moving projects to from Node 6 to Node 8:
Extending Surveyor to scan Merge Requests
We chose to extend Surveyor to get metrics on inner source activity in the company. Surveyor already contained all the infrastructure needed to gather, store and analyse data from GitLab. Now every scan not only gathers data about projects source code, but also about merge requests which were created or changed since last scan. Again, this is all retrieved using the GitLab API.
We store the basic information about each merge request: who created it, its state, who merged it, time it took to merge, etc. But more importantly, Surveyor is also trying to recognise automatically whether the MR was created by someone who owns the project (and thus it doesn’t represent inner source activity) or by someone from outside.
This is where the repository-per-service model comes in handy as we can use GitLab’s permission system to check who has ‘owner’ or ‘master’ access to individual projects and thus can be considered an owner.
GitLab permissions system sometimes doesn’t accurately reflect the true ownership of a project, but it’s close enough.
Again, Tableau is used to visualise the data:
Let’s end by presenting some interesting statistics about Skyscanner engineering activity:
- in January 2018, our engineers opened 6033 merge requests in total
- out of it, 1609 merge requests represent inner source activity. This is a bit more than one in four
- our most active project received 60 inner source MRs in January. This is more than 3 MRs per work day created by someone outside of the owning team. A lot of additional workload for the owners!
- our most active engineer opened 88 merge requests in January 2018 and our most active reviewer approved 140 merge requests in the same time period
Are you interested in any other statistic? ask us in the comments section!
Internal Open Source
This is part 3 in a series of 3 posts
SEE the world with us
Many of our employees have had the opportunity to take advantage of our Skyscanner Employee Experience (SEE) — a self-funded, self-organized programme to work up to 30 days during a 24 month period, in some of our 10 global offices. There is also the opportunity to work for 15 days per year from their home country, if an employee is based in an office outside of the country they call home.
Like the sound of this? Look at our current Skyscanner Product Engineering job roles.
About the authors
My name is Giovanni Perna. I’m a full-stack software engineer working in the Quantum team in sunny Barcelona. We look after the Skyscanner configuration service. I love (free)diving, travelling and sport in general.
I am Jakub, engineering manager in the London office. I lead a team which helps engineers with automating their daily duties and rump up the productivity of every developer in the company. I have 6 years of professional history from global companies and as well as startups. I’m enthusiastic about Agile and Lean software development, Domain Driven Design, Continuous Delivery and Cloud.
Remember! Sign up for our Skyscanner Engineering newsletter to hear more about what we’re working on, interesting problems we’re trying to solve and our latest job vacancies.