Managing a large GitHub Organization with Terraform (Part 1)

Diego Morales
stonetech
Published in
6 min readMar 2, 2019

Recently we decided here at Stone to start managing our GitHub organization as code using Terraform. The official Terraform GiHub provider is quite complete (it supports almost anything you want to do on GitHub) so it’s a good choice to enable an infrastructure as code (IaC) approach to it.

But Stone’s GitHub org is quite large (~300 people, 50+ teams, 2000+ repositories) and that turns out to be both a reason and a challenge to adopt this approach. A reason because some configuration standardisation is welcome or even necessary in some cases (e.g. a specific sets of repos requiring the same branch protection config), and we also wanted to have tighter control over access permissions to some of our repos. Doing that in a org with so many repos requires some kind of automation so to achieve consistency. And a challenge because there are a number of technical and “social” difficulties we are stumbling on along the way.

This will be a four part series focusing on the caveats, problems and challenges we are facing managing such a big org with Terraform, and how we are tackling them. We will go through:

Part 1 (this one):

  • A very short intro to Terraform’s GitHub provider usage
  • Using and fighting with terraform’s modules for keeping standards
  • Love and hate on terraformed land: dammed list resources

Part 2:

  • Roadblock: a huge terraform state, and the need to split it
  • Using GitHub PRs and codeowners feature for an approval flow

Part 3:

  • Automating Terraform’s plan and apply on Azure DevOps Pipelines

Part 4:

  • Lessons learned about Terraform and Infrastructure as Code in general

Let’s start!

A very short intro to Terraform’s GitHub provider usage

We assume you already know at least a bit about terraform, but if you don’t, check the official Intro and Getting Started.

Let’s show a sample code that:

  • Adds my GitHub user to an organization
  • Creates a team
  • Adds me into the team
  • Adds a repo
  • Adds access for the created team to the repo

Each action above is done by a different Terraform resource:

Example 1: Some basic GitHub resources’ usage

To actually run that code, you will need to generate a GitHub personal access token, if you don’t already have one. Terraform will use that token to get access to your org on GitHub. You can specify the token as a “token” param on the “provider github” section, but I strongly suggest setting a GITHUB_TOKEN environment variable instead, which is read by the provider directly. That way you avoid Terraform storing it unencrypted on disk when saving a plan output file, for example (we will talk more about that when discussing automation pipelines).

The GitHub provider documentation explains the usage of each resource. There’s also an excelent post by Nikolay Yurin that introduces the basic usage in much more detail, with examples of terraform commands (plan/apply):

Now let’s get into some of the caveats we discovered along the way.

Note that some of the limitations discussed in this series will be solved or improved by the upcoming Terraform 0.12 update, that will bring a ton of new language features. You can follow its development progress here.

Using and fighting with terraform’s modules for keeping standards

As said in the module docs:

Modules in Terraform are self-contained packages of Terraform configurations that are managed as a group. Modules are used to create reusable components in Terraform as well as for basic code organization.

We use modules in our GitHub code to set some standards on repositories configuration and also to make some configurations easier or more compact.

Let’s see an example:

An user is referenced in the org membership code as well as in one or more team or repository configurations. Most of Stone’s collaborators use their personal GitHub account associated to the org (together with a SSO/SAML config that requires their company password to access the org repos), and sometimes the GitHub username of a colleague is not something easy to find out without having to ask him.

To help with that we maintain in the codebase a map of company_email to github_username, and wrap resources that need to reference an user on modules that use that map internally. So the user just need to now the person email address (something very easy to find out), and the module handles the translation internally. The map itself is a module:

Example 2: An email to github username map Terraform module

An org membership module uses that map and includes some standard configs, like auto-adding every user to an all-org-members team:

Example 3: An org-member module that uses the user-map module

So the person making a PR just have to know the user company mail address and add a very short resource:

Example 4: Usage of the org-member module

Another example (with no code samples to avoid making this post too large) is having modules for specific kinds of repos that need some particular config. Like for example a module for a set of ansible roles, each on its own repo, that are accessible to anyone in the org but have the same branch protections configured that force a review flow and restrict merge to a particular team. We have a module that embeds all those configs and keeps the declarations of those ansible repos very easy and compact. This is a good example of our of module to standardize our GitHub repos configs.

But not everything is so sweet about terraform modules. Terraform (very intentionally) has very little conditional constructs. Creating flexible reusable modules that do different things based on input params is quite difficult or impossible. All you have is variable (and function) interpolations, a ternary if, and a hacky usage of the count param to create or not create a resource. Many resources (like github_branch_protection, for example) have optional internal sections, that you can’t switch on or off using count or an ternary if. Here’s an interesting post with more about that.

You also cannot pass whole objects as input to module. So for example if I would like to pass a whole instance of my org-member module as input to some other module .. no, I cannot. This is not such a good example, because that module is very simple and have no outputs, but consider we made it more complex. I have to pass the individual outputs from a module to the inputs of the other.

But fear not, upcoming version will improve both conditionals and rich input and output types.

Love and hate on Terraform land: dammed list resources

If you work enough time with Terraform you will eventually want something like a foreach loop: create many resources from a list of items (this is one of things being added to the upcoming 0.12 version). The way you can do it today is using the count meta parameter and functions like element and length. Let’s say you want to add a list of users to the org without having to write a resource for each one of them:

Terraform example showing list, count and function usage on resources

Cool, right? So simple! Well … just until you try to remove someone in the middle of the list. Terraform identifies the state of each item in that list of resources using just plain index numbers. If I decide to remove “someotherguy” from the org above, it will also remove and recreate/readd dgmorales, because its index will change from 2 to 1. “someguy” will remain index 0, so it’s untouched.

So yeah, been there, done that, created a list of almost 200 users, just to realize that removing someone in the middle would kick a whole lot of people from the org and invite it back seconds later. We had to convert the code to one resource description for each user (and move the existing states to avoid annoying our users). Beware.

Conclusion — Part 1

We presented here the reasoning for an infra as code approach to Github management, some code samples showing the usage of GitHub terraform resources, and some of the goods and the bads we encountered so far.

Another interesting article about Terraform’s caveats is this one by Henrique Barcelos:

On the next part of this series we will get into the problems of scale: what happens we you have a terraform codebase declaring 5000+ resources? And what to do when you hit the GitHub API rate limit? Stay tuned!

--

--

Diego Morales
stonetech

SRE Tech Lead na Stone. Doido por automação, DevOps, Agile, churrasco e corridas de aventura.