Enforcing code ownership at Udemy

Published in

Udemy Tech Blog

7 min readOct 2, 2020

Code ownership is a tricky thing to get right. Things become even trickier when you throw a mono-repo into the mix, with no single team having ownership over the whole thing.

This topic wasn’t something that interested me at first, I must admit. The task of coming up with a solution that enforces ownership of every single file in a repo isn’t something I’d normally drop everything for. The more I thought about it, however, the more I wondered — how can it be done well?

GitHub Code Owners is half the solution

GitHub offers up a really neat feature called code owners, something that we use heavily at Udemy, in all of our repos. Alongside this, you can configure your main branch to require that at least one code owner approves the pull request before it’s eligible to be merged. This works great for simple solutions, but when it comes to mono-repos, with teams overlapping on ownership and no individual team owning all of the code, GitHub’s code owners offering was no longer enough.

We already enforced code owners on our repos, but this doesn’t cover if some of the files aren’t owned by anyone

See, our mono-repo is not only a mono-repo, but it was a mono-repo with thousands of files without an owning team (around 1 in 8 files did not have an owner entry in the CODEOWNERS file). This meant that it was very easy to edit something that you or your team did not own without the actual owners being any the wiser. This is a huge no-no for security and compliance.

How can we solve the other half?

I work on the Build Engineering team — if you want to read more about that, check out my previous article here, where I go over what my teams and I get up to. To put it simply, part of our job is creating tooling and solutions for developer workflows, and this is where we came into the puzzle.

One of the other parts of our job is providing a CI platform to the engineering org. Aha! There’s the solution. How can we get every file in this repo to have the correct owner and enforce that every new file added in the future is also owned by a team?

Well, if we’re going to be enforcing it on the policy level anyway, then there’s no point running CI for a pull request if there are changes to files that aren’t owned by anyone. It would be a waste of resources — the pull request shouldn’t be merged anyway and a future modification to the CODEOWNERS file would cause a new build to fire when the changes are merged into the branch too.

Changing our development workflow

We treat our engineers as real customers of our product. Sure, our user base is an order of magnitude smaller than that of Udemy.com, but to us they are just as important as our customers. Every change we make — big or small — must be thought out, done with good reason, have benefits to the engineering org as a whole, and finally, be communicated well. We wouldn’t be happy if someone changed our stuff overnight and we can’t expect our developers to be either.

This is a huge change in developer workflows. Before, developers could have been blissfully unaware that a file they added or changed wasn’t covered by a code owner, even though the rest of their pull request was. This couldn’t continue, but it’s not the developer’s fault- the “punishment” cannot lay solely on them. We can’t ask them to change the way they work and delay them in doing so, without good tooling around the solution. A compromise, if you will.

This is a stripped down sample output of the repository visualising tool. The repos and files here do not reflect our actual setup — there are many, many more

We calculate the code ownership of every file in our repos whenever there is a merge into the main/default branch. We also calculate the state of the directories with them being able to either be fully covered, not covered at all, or have partial coverage. This helps provide a really nice way of visualizing aspects of the repo that aren’t covered and developers can quickly check if the areas they actually own are reflected in the CODEOWNERS file.

We also point them to exactly where in the CODEOWNERS file the matching rules are from, so it’s easy to understand why a file is owned by a certain team. This also enables them to fix this issue quicker without having to mentally parse a bunch of globs and apply that to the complicated folder structure of the mono-repo. Hovering over the badge on the file shows you what teams own it and you can click through on the team to get more information about them, such as the Slack channels they use.

This is a really helpful tool outside of enforcing code owners everywhere — it’ll be kept in place for the foreseeable future. This was our compromise to our engineers — yes, we need to stop your CI running if you change or add a file that doesn’t have a code owner, but in return, we’ll make it as easy as we can for you to fix and visualize your ownership so you’re not disrupted.

The other side of the puzzle is then checking that the changed files in the pull request are owned by at least one team. Fortunately, we already have custom logic from GitHub through to Jenkins that allows us to control and visualize development workflows a lot easier, and we could slot this right into it.

Sometimes I like to pretend that I can draw so it makes justifying buying the iPad a little bit easier

Whenever CI would normally run for a pull request, an API request is made to our Build Portal that pulls all the relevant information about the pull request it needs as well as the contents of the CODEOWNERS file at the branch of the commit SHA (enforcing that changes to the code owners outside of the pull request are pulled in before it’s re-evaluated). We then produce an output report that looks very similar to the build view flow that our developers are used to. In this report, we highlight the files that were added or changed without code ownership, as well as the files they’ve changed that are owned by a team. On top of that, it also includes links to the team’s pages and points to exactly where in the CODEOWNERS file the ownership for that file is defined. If the check passes, CI runs as normal. The status of the check is also reflected in the pull request.

A sample output of what the developer sees when they push a commit to GitHub, validating that their changes have ownership

As I mentioned above, we treat communication pretty seriously, so we include links to Wiki pages containing information about why, how, what, when, why (x2) as well as links to the relevant GitHub documentation about code owners. Slack is noisy, we don’t blame people for missing an announcement. But we also don’t want to leave them confused.

Extra checks on code owners

When we run the code owners computation on a merge into the main/default branch of the repo, we check if for whatever historical reason there is an entry where the team doesn’t have write access to the repo. If there are any entries without write access, it pings us on Slack so we can fix the issue. The same goes for the checks done on pull requests — if the developer tries to add a code owner entry and the team does not have write permissions on the repo, it’ll fail and notify the developer in the UI that this is the case.

We think this is a really nice solution that fills in the blanks when it comes to the code owners features from GitHub. It provides an awesome easy to navigate tool for developers to find a file’s owning teams super quickly as well as the security and compliance we needed governing our development repositories at Udemy.

Author

Ryan Clark is a Senior Software Engineer on the Build team at Udemy. He works on a variety of different tools such as the Build Portal, CI, and development environments.