Using Terraform to automate everything — from Dominos to Slack

Automating your ticket queue with self-service Terraform repos works for more use cases than you might think.

Elliot Graebert
11 min readSep 2, 2022
https://www.istockphoto.com/portfolio/imaginima

This post argues that IT and Technical Operations teams should use Terraform to add self-service functionality to the applications they support.

For IT and technical operations teams, a significant source of support work involves using their administrative powers to create, update, or delete objects in tools like GitHub, Jira, Slack, Salesforce, Workday, and so on. For example, when a new team is created, they may want a new GitHub organization, security group, mailing list, Jira project, and Slack room. Internal productivity teams often feel like they are facing two bad choices:

Option A:

  • Enable non-administrators to create the above-mentioned primitives, but watch as untrained users create objects without concern for naming conventions or best practices. Pretty soon, the system is mired in thousands of unused objects, and it is a struggle to make sense of it all.

Option B:

  • Restrict permissions to administrators and then get flooded with hundreds of tickets from users asking you to change settings. Before long, the administrators are just button smashers, absolutely dreading their support rotations.

If you throw a software engineer at this problem, they will immediately propose building an application that automates these interactions. Each portal they create will take a couple of weeks to show some promise, and then six months of editing before the engineer is thoroughly bored of the problem space and demands to move on. The underlying application becomes stagnant, but its usage within the company continues to evolve as the needs of the business change. Within three years, people detest the application and it becomes the bane of the department.

Alternatively, if you throw a DevOps engineer at the problem, they will (unsurprisingly) suggest using Terraform. Out of the gate, Terraform provides a self-service GUI that enables non-administrators to propose changes that administrators approve. Some creative use cases where self-service code repos can be used to eliminate toil tickets are as follows.

  • Source Control: Automate your GitHub teams, organizations, and repository settings.
  • Authentication: Create users and groups, populate membership, and set up SAML integrations.
  • Learning Platforms: Create course definitions, content, and assignments.
  • Ticketing Platforms: Standardize project settings, templates, workflows, and assignments.
  • Vulnerability Management: Manage assets, define scans, and create vulnerability exceptions.
  • Knowledge Platforms: Create folder structures, templates, and permission structures.

If you start combining these use cases, you can have a single Terraform module that creates resources in multiple apps (useful for creating a new team resources).

My proposal is to automate your internal applications (like Slack and Jira) using an infrastructure-as-code (IaC) pattern like Terraform.

If you are brand new to Terraform or why infrastructure-as-code (IaC) is the methamphetamine of DevOps, I recommend taking a quick pause to read some of these other posts, as there are a ton.

If you like this post, please 👏 , subscribe, or follow to let me know I’m on the right track!

Everything-as-code

When most people think about infrastructure-as-code, they focus on the surface-level benefits: declarative infrastructure, cloud automation, cloud abstraction, and modules for connecting components together. However, we are going to focus on some other fundamentals.

  • A free GUI — GitHub/GitLab/Bitbucket is the world’s best WYSIWYG platform.
  • Declarative state — Tickets exist in isolation. Terraform defines the entire state.
  • Minimization of back-and-forth information gathering — Use CI to provide rapid feedback before getting a human involved.
  • Peer review — Allow multiple people to collaborate before making a decision. This is great for the longevity and security of a system.
  • Self-service — Using tools like policybot and bulldozer, you can make the entire workflow hands-off.

A free GUI

Most infrastructure teams don’t have front-end engineers.

Every infrastructure team I’ve talked with has had a wide diversity of people and skill sets for the traditional backend skills, like Linux systems, API design, database design, security, and backend coding. However, almost no teams had solid front-end engineering skills.

Without front-end engineering talent, your team’s ability to cheaply generate internal-facing apps that both solve the users’ problems and are low-maintenance is a fairy tale. It doesn’t matter how good your backend is if you are always struggling to get an adequate GUI.

In infrastructure-as-code, your “GUI” is GitHub (or equivalent).

Think of the file visualization and in-line editing features of your source control tool as your user interface. And yes, this is a real Terraform provider. While this is probably not what Hashicorp originally planned with Terraform, you can’t deny that it is amazing!

By using a tool like GitHub for your GUI, you automatically get:

  • authentication and authorization fundamentals,
  • data visualization and modification tools,
  • input validation and user feedback,
  • audit trails, and, most importantly,
  • a dark theme.

Declarative state

Have you ever met a user who first searched existing tickets for context before filing their own ticket? Not a chance.

Ticketing systems are very good for one-off requests in which all the context for a request and its response are captured in a single location. Generally, in a support workflow, users will file tickets without looking for more context, and responders will look up at most one or two reference tickets. In contrast, Terraform uses a declarative model where the code represents the state of what should exist. You can treat the files as an inventory list, which is exactly what you want in a self-service model.

With GitHub as a GUI, Terraform provides users with context through its declarative state.

Humans are very good at inference when they see existing examples. The example above could be from a case where the requestor for a new group didn’t realize that the users weren’t created yet, or that when creating those new users, they would need their full names. Terraform’s declarative nature also means they would be aware of the other “kids” group and use that context to follow a similar pattern.

Minimization of back-and-forth information gathering with CI

The worst part of support work is the back-and-forth on tickets to get the required information.

As mentioned above, users with low context usually file tickets that have incomplete information. This requires the support engineer to do back-and-forth information gathering in order to resolve the tickets. Each time a ticket is passed to another person, you have to assume a time delay for the other person to switch contexts and take the next step. This back-and-forth iteration can easily extend a quick task from minutes to hours to days.

With GitHub as your GUI and Terraform providing context, the CI process minimizes human validation.

If you augment your Terraform workflow with unit tests (via policy languages like Opa or Terraform Sentinel) you can then add more complex business requirements. For example, you could ensure that a kids’ playlist only includes age-appropriate music. In this case, you could check the song object to ensure that restrictions.reason is not equal to “explicit” (see Spotify API).

In a continuous integration model (whether in a generic tool like CircleCI or a Terraform platform like Spacelift), the user’s proposed change will keep getting rejected until it passes all the checks. Because the business requirements are captured as policies, the user will get instant feedback on if their proposal has a problem. The user can then iterate on that pull request until it is valid.

By the time an administrator goes to review the code, the change should already pass the business requirements. No need to follow up on missing inputs, or explain why you shouldn’t add Rap God to a Bluey playlist.

Peer review

Peer review improves velocity and reduces the error rate.

The majority of engineers are quick to accept that code reviews improve velocity by catching mistakes early in the pipeline, but they often overlook applying this technique to all the non-coding aspects of their jobs, such as ticketing systems, learning platforms, authentication services, and communication tools. I suspect that these engineers do believe that peer review is helpful, but it simply doesn’t occur to them that all tools could all be codified.

While peer review is not supported natively in most of these tools, Terraform providers exist for a surprisingly large number of services. This means that code repositories can be set up for these use cases, and it is possible to enforce peer review:

With GitHub as your GUI, Terraform providing context, and CI providing automated validation, humans can then validate those parts that require it.

While the above example is proper code, it is not Bluey’s intent to add her Dad to her private kids-only Slack conversation. If Bingo (her sister) was reviewing this, she’d reject it right away!

Self-service

Self-service enables teams to manage more infrastructure with fewer people and still keep their support burdens low.

The goal of self-service is to enable end-users to unblock themselves. The goal is maximal usage of the underlying tool with the least amount of support maintenance. One also has to consider the security implications, as the goal of self-service is not to violate the principle of least privilege (i.e., grant everyone admin and hope it works out).

Here is how I view security risk versus self-service capabilities.

  1. Low risk — Allow a user to fulfill their own needs.
  2. Medium risk — Allow a user to fulfill their needs with the help of a teammate who reviews their requests.
  3. High risk — Allow a user to submit a request, but only the DevOps or security team can approve it.

Of course, a major concern is in how cases (A) and (B) can be abused. Guardrails for this could include the following:

  • unit tests to block commits that violated business standards,
  • policybot to only allow cases (A) and (B) for a limited set of files, and
  • bulldozer to merge PRs on their behalf (not granting anyone write permissions).

With GitHub as your GUI, Terraform providing context, CI providing automated validation, you can push the human validation step down to the teams themselves.

In the example above, you can see how it’s possible to encapsulate all the resources needed by a team into a single module. Inside the module, you would see providers for Slack, AzureAD, Pagerduty, Asana, and O365. By using a module, you eliminate most of the complexity of Terraform.

In my proposed self-service-as-code model, the teams responsible for these applications would create self-service repos with automated enforcements for the critical elements on which they have context on. The actual request/approval process would then be pushed to the downstream teams who have more context. You’d want to leverage policy tools like Sentinel or OPA, on top of careful repository permissioning in order to prevent abuse.

The culmination of all these ideas results in services that are almost entirely hands-off, enabling infrastructure teams to focus less grunt work, and more on intellectual work. There is a large upfront cost, but it’s worth it to avoid all those churn tickets.

Can less-technical users make pull requests?

It turns out, yes! Arrogance is one of the biggest flaws of our industry. I could write an entire post about how software engineering hubris has held us back significantly. We have convinced ourselves that our work is deeply intellectual and unique among disciplines and that we cannot learn from other industries. I simply do not believe this is true.

If a someone can learn to use Salesforce, then they can absolutely learn GitHub pull requests

From Eleonora and Francescoc

Terraform and GitHub are great because they are simple, not because they are complex.

I’ve met plenty of lawyers, finance teams, QA engineers, administrators, and salespeople that were all capable of following complex workflows. It’s just a matter of clearly documenting the workflow and doing basic training. Record a video, make it part of onboarding, whatever it takes. Making a pull request is not rocket science.

Where engineers usually get off track is that they create self-service repositories without investing in the usability and training to make them effective to their user bases. If you think about it, if GitHub was overly complex, it wouldn’t have over eighty million users.

Make writing code part of your company culture.

Be an inspirational leader who declares that at your company, everyone writes code. I’m a fan of breaking down barriers instead of building them up. Instead of telling someone that they can’t write code because they don’t have an engineering degree, I’d rather be the person sitting down and teaching them a new workflow.

The more writing code and making pull requests is a part of everyone’s roles, the easier it will be to onboard people into this mindset. As more and more use cases involve Terraform, the faster people will be able to pick up the workflow. And the more you automate, the more time you’ll have to automate more.

The ultimate goal here is to automate all internal churn work away, enabling everyone to focus on intellectual work.

So why Terraform instead of something like Go+YML?

The value of Terraform is not in the HCL, but in the underlying framework.

At first, most people are daunted by the idea of creating their own Terraform providers, so some default to YML or JSON.

However, Terraform provides a better framework for

  • reconciling the desired versus actual state,
  • having modules that greatly simplifying inputs while still being traceable,
  • creating clean(ish) output for each of the automation phases,
  • including built-in mechanics for secret management,
  • supporting the validation of the desired or end state (Sentinel), and most importantly,
  • many enterprise and SAAS companies already support Terraform as their configuration language.

If you offroad into your own home-grown system, you are likely to end up rebuilding much of the functionality listed above.

You’ll start off trying to save time with the “simpler” solution of automating it yourself, but over time, your needs will grow. Trust me, and just start with Terraform.

Wrap-up

Doing ticket work is an emotional and intellectual drain when the work only entails passing messages and pushing buttons. Creating a code-based, self-service workflow for your ticket workflows is a great way not just to improve your coding skills but also to eliminate grunt work. A pull-request workflow has the benefits of a GUI, enforced standards with optional inputs, peer review, and self-service.

Terraform should be your default tool for this pattern, as it already includes many of the fundamentals that you’ll want over time. Since many companies are now directly supporting Terraform as a configuration and automation framework, Terraform is much more likely to be a standard you can apply over and over again.

Do you have other great examples of Terraform providers for automating your IT workflows? Feel free to drop a comment!

Want to connect?You can find me on LinkedIn.

--

--

Elliot Graebert

Director of Engineering at Skydio, Ex-Palantir, Infrastructure and Security Nerd, Gamer, Dad