Automate Your Infrastructure with GitOps, Terraform and CircleCi
Over the last few years there’s been a massive drive towards automation in the software development arena. Teams have seen the benefits of adopting Continuous Integration, and more are taking at a step further and embracing Continuous Delivery and Deployment patterns for their software.
As part of one of my side projects I’ve been using GitHub for source control, and CircleCI to deploy my code into a Kubernetes cluster powered by GKE, a part of the GCP public cloud. This is often refered to as the GitOps pattern — driving all of your automation from a single source, in this case GitHub, and hooking into certain events to trigger activities.
All of this lead me to looking into using a similar pattern to manage the infrastructure I was deploying to. Infrastructure as Code (IaC) allows for infrastructure definitions to be handled just as you’d handle ‘regular’ source files, so I figured I should be able to apply GitOps principles, and many of the same tools. The one ‘missing’ component of my theory was a tool for turning my infrastructure definitions into actual infrastructure in GCP, and this is where Hashicorp’s Terraform project came to the rescue.
Terraform is an open source IaC tool that allows the creation of declarative configuration files that can be shared, edited and reviewed using your normal stack, just as you would with regular code. It can also be stored and versioned using any Version Control System, meaning you can have excellent control over your infrastructure. Just what I needed!
But before we begin, some warnings!
- Terraform is a beta project, and is changing all the time. All code presented here worked at the time this was written…
- Handle this all with care, I’m not sure that right now I’d use this for mission critical infrastructure at this point. There’s plenty of oportunity for Terraform to do something that you’re not expecting an cause outages
My GitOps Flow for Infrastructure
So, having assembled all of the moving parts, it’s time think about how we want to use them. In this case, specifically the GitOps patter we want to implement.
In my case, I went with something quite simple.
- Changes to the infrastructure will be made in a branch from
- Once complete and locally tested, a pull-request is created to merge the branch back to
master. This will automatically create a Terraform plan of the changes which will be attached to the PR for manual review
- Once the manual review is complete and the PR approved, the merge is completed and the updated
masterbranch, at which point the changes are deployed to the production environment
I’ll be building this out using GitHub, Terraform and CircleCI, with just a smidgen of Docker thrown in.
Managing Infrastructure with Terraform
Let’s start by defining the infrastructure we want to build and manage, in our case that’s the private Kubernetes cluster and associated networking for it.
The file below is what we’ll run through Terraform in order to build our cluster and network infrastructure. We’ll walk through it step by step.
I won’t delve too far into the weeds of what we’re building in GCP, I’ll focus on the specifics needed to make this work well in a CD process — specifically the authentication with GCP, and how we store state.
Authentication is handled using a JSON token file for a GCP storage account that Terraform reads from the same directory as the Terraform file, in this case
terraform-deploy.json. You’ll need a Service Account in GCP with Project Owner permissions to enable it to create, configure and destroy the infrastructure you need. It will also need create and read permissions on a GCP Storage Bin that can be used for state storage. As this token has very high levels of access, you’ll want to keep it secure. Don’t put in source control, and avoid disclosing it any further than needed. In test, developers can use tokens to your test systems etc. In production, we’ll see how we can set this up securely so it’s only available at the point Terraform needs it, and isn’t disclosed.
State management is critical for Terraform. Terraform writes it after it creates, updates or destroys infrastructure, and reads it before each run. We need to ensure that this state is held centrally so Terraform has access to it from CircleCI. The easiest option given we’re targeting GCP is to use a GCP Storage bucket, which is what this script does using the GCS backend, but there are plenty of other supported options such as AWS S3 if you want to go that route. Just make sure that you’ve created the storage bucket, and that the Service Account Terraform will be using has access to it.
The rest of the script sets up a private GKE cluster, a subnet to run the nodes on, a node pool with auto-scaling an
n1-standard-1 nodes and a few other network bits to tie it all together. It includes a single IP address for all traffic egressing the cluster too, which is handy it you have external services you’ll be calling from with in the cluster that need IP white listing.
Terraform + Docker for CircleCI
Once we have our infrastructure definition in place, and somewhere central to store the current state, we can store it in version control, so it’s ready for the team to work with. We can also start to think about automating the workflow of modifying the definition, reviewing and approving the changes, and then having them automatically applied. This is where CircleCI comes in, we can hook it up to GitHub, store our code in there, and then use CircleCI’s integration with it to trigger workflows when changes are being made.
CircleCI uses either VM’s, or Docker-based containers for running CI/CD operations. There are a number of standard images available for this covering a wide variety of technologies and tools, but doesn’t really have anything we can use to run our Terraform project. Fortunately, it allows us to easily use our own custom images if we need something a bit special. So first up, let’s create an image with Terraform on it we can use. The Dockerfile below will give us what we need.
This is a pretty simple Docker definition. We’ll start with the latest stable Alpine image, so we get a nice streamlined final image with only what we need in it. We update the Alpine package manager, and then use it to ensure that a bunch of Terraform’s prerequisites are installed. A quick clean of the package cache, and we can run the process of installing 0.12.6 of Terraform.
Use Docker to build the file (
docker build . -t [project]/[name]:[tag]), and push it to the Docker Registry ready for CircleCI to start using it. You can also use a private Registry if you need… or I’ve built an image that can be used, I referenced it in the example CircleCI config below.
You’ll want to store your dockerfile somewhere safe, I generally put it in an
images\primary folder under the
CircleCI definition file
Now we’ve got all of the bits we need to get the workflow up and working in CircleCI.
The first thing we’ll need to do is build the workflow definition that we want CircleCI to execute for us. To do this we add a
.circleci folder to the root of the project containing the Terraform definition, and in that add a
config.yml file, based on the one below
This uses version 2 of CircleCI’s infrastructure and definition syntax. First we set up some defaults that will be used in each of the
jobs defined in the file. A job is a single unit of the process we want to run, these jobs are then built into a workflow at the end of the file using rules to control when and how each step is run. The main default we set up is the Docker image that we want to use to run each job, in this instance it’s a custom image in the public Docker registry that I’ve built from the Docker file we looked at earlier.
It’s worth noting at this stage that CircleCI spins up a fresh container for each job, so in theory we could use a different image for each job. We don’t want to do that in this case, be we do need to be able to persist state between jobs, we’ll see how to do that when we look at the jobs in detail.
The first job we define,
init, first checks out the target Git repo into the container, and then creates a new file in the same directory containing the GCP authentication token held an environment variable we’ll create when we configure CircleCI itself. This file is used to allow Terraform to authenticate with our project in GCP and read state, as well as create our infrastructure. Finally we run
terraform init which will pull the latest state for the infrastructure, and ensure all of the correct Terraform plugins are installed and ready to go.
Obviously all of this will have changed the working directory from what was checked out from source control, and we want to ensure all of those updates are available in any subsequent jobs. We can do that by using CircleCI’s
persist_to_workspace operation, to effectively push the current directory, to enable us to pop it in later jobs in the pipeline.
The next job we’ll define is
plan, which runs a
terraform plan. First we can pop the workspace we persisted above, which get’s all of the updates from the previous job into the fresh container that’s been spun up for this job. Using that state, we can run the terraform plan command, saving the output so that again, we can use it in later jobs as and when we need to. We also take this plan, and save it as a build artifact so we have it available to be manually reviewed. Finally, we once again save the modified state for later jobs to use.
The final job we define is
apply, which we use to apply the Terraform plan we previously created,
show the results, and then store them as a build artifact so again, we can review them later.
That’s all of the building blocks for the pipeline in place, now we use CircleCI’s
workflows facility to define how they’re used to construct our pipeline.
First off, we want to run the
init job; we want to do this in all situations the pipeline is triggered. Next we want run
plan, but not until
init has completed, so set
init as a required job for
plan to run thus ensuring they run sequentially (by default, CircleCI will try and run jobs in parallel where it can). Finally, we want to run
plan has completed, so that the changes get applied to our infrastructure. The caveat here, though, is that we only want to run
apply if the change triggering CircleCI was made to the
master branch of the project — you’ll recall from our GitOps workflow that only changes to
master get applied, anything else is only planned for manual review prior to a merge. We can ensure this happens by filtering the branches that cause
apply to run down to
So, that’s our CircleCI workflow defined. We’ll get it checked into GitHub, and then we can head off and complete the processes by attaching the GitHub project to CircleCI, so that the triggers start firing, the jobs start running, and we close the circle (so to speak!).
We’ve now got everything we need put together, and sat in GitHub. However, we don’t have anything watching the project right now, so changes won’t trigger any operations; the last step of this process is to sort that out. To do that you’ll need to signup to CircleCI, the good news is that you can play with it for free, perfect if you just want to dip a toe in the GitOps water!
Once you’ve signed up and attached your GitHub account, you can add your project to CircleCI, which will read your configuration, and setup your workflows ready to go.
Just a couple of tweaks are needed before it’ll all work.
First, navigate to the settings for your project in the CircleCI UI, and under
build settings/environment variable add a new variable to hold the JSON token for the GCP service account that Terraform will use. In this example, it should be called GCP_CREDENTIALS, to match the Terraform configuration file. For the value, paste in the entire content of the token file. As an added benefit, CircleCI will obscure the value in the UI, keeping it nice and secure.
Next, we need to tweak a couple of advanced settings on the project. First, ensure that
Pass secrets to builds from forked pull requests is ‘off’. If you’re running an open source project you don’t want to be sharing the JSON token with people forking your project!
Secondly, enable the
Only build pull requests is set to on. This ensures that commits into the feature branches doesn’t kick off a build. In this way we enforce our GitFlow, PR’s into
master will build, as will the completed merge, but developer check ins on to branches won’t.
This pattern has so far worked very well. I’m not convinced that it’s robust enough for a true production environment yet, but it’s well on the way and certainly something I’m going to keep playing with. There are clear advantages to IaC; ease of collaboration, versioning of infrastructure, and ease of visualizing what you have. Combining that with an automated GitOps pattern is the obvious next step, and this is one way of putting it together.
Whilst this article covers deploying into GCP, Terraform has great support for a whole bunch of infrastructure and platforms, so everything I’ve shown here could be applied to almost any project — go have a hack!
Update — If you found this article helpful, I’ve just published an article covering the basics of using Terraform with the Google Platform, including a bunch of helpful code snippets. You can find it here.