Manage OpenShift Clusters with Terraform

Managing Openshift clusters with Terraform

Published in

The Factory

5 min readJan 4, 2022

I’m currently working at a client where our team is increasingly embracing the motto “automate everything”. Due to our recent success managing our internal CI/CD service using automation tools, embracing this new motto has accelerated. So much so that when we were about to set up new Openshift clusters, we wanted to manage the cluster resources using automation as well. We decided to use the kubernetes terraform provider. Here are our experiences and thoughts doing so.

Introduction

Let me first start by setting the scene. As a team we provide a set of standardized services which help internal development teams. All our services aim to accelerate our customers software development or deployment process by taking over a part of the chain with a standardized solution that we manage for them.

With a small team we deliver six of these standard services of which a compute service (using Openshift) is one of the bigger ones. Before the automation journey we were managing four Openshift clusters with a combination of the CLI and console. This worked fine but due to our recent experiences managing one of our other services from code, we wanted to try manage cluster resources from code using automation as well.

Our expected benefits were:

versioned definition of cluster resources
standardized team development workflow
consistency between clusters
re-usability for other clusters

The timing was great as we were just about to start with a new set of Openshift clusters for our development teams. A dev/test/staging cluster and a separate production cluster. This was the perfect moment to greenfield this new way of managing our clusters.

Starting out

We started this journey about half a year ago by first comparing our required resources to the documented resource definitions of the provider. Since there was and still is no Openshift provider we had to settle on the kubernetes provider. Using this provider already created a problem since at that time, it did not support all the resources that we needed. The main problem specifically was that there was no way to define Openshift specific resources. Stubborn as we were we went with it anyway, as we could still manage quite a lot of the resources.

We decided to save the yaml definitions, of the resources that we could not yet apply with the provider, to a separate folder in our repository to keep a complete overview of what was applied to the cluster. We promised to ourselves to automate those resources in the future using custom scripting in our pipeline.

At first, we laid out the folder structure and automation foundation by recreating our preferred workflow in CI/CD pipelines. The resulting folder structure is as follows:

Devcontainer for local development
Pipeline folder
Manual config folder for the yaml definitions of the yet unsupported resources
Shared terraform resources for both clusters with separate variable files per cluster

The shared resource base helps to keep our cluster-set similar. Shared variable values are defined in the terraform.tfvars whereas the values that cannot be shared across clusters are defined in separate tfvars files. The main.tf file includes most of the generic resources such as namespaces, networking, service accounts, etc. For the others, the filenames should already give some clues to what type of resources we create there. Perhaps something to dive into for the next blog.

Since our team loves to keep things simple, our workflow and pipelines are quite easy. Adding new resources to our cluster requires creating a pull request with the suggested changes. One of our other teammates will then look into the code changes and also the PR pipeline which shows the planned infrastructure changes for both clusters.

Once the code is approved and merged to main, a new pipeline will run which again does a terraform plan for both clusters. The only difference to the earlier plan is that this time the plans are saved. These saved plans are then applied separately for each cluster when the time is right. In between the plan and the rollout, changes to the cluster might have happened that change the rollout situation. Therefore we use saved plans as those will fail to apply if the cluster is no longer in the same state since its planning. The described workflow is as follows:

Although we did not yet manage all cluster resources from code at this point in time our team immediately noticed the benefits:

We could now easily trace our versioned definition of (some) cluster resources
We could peer review our cluster changes in a standardized workflow
We could easily keep the set of clusters consistent

Thank you updates

The only benefit that we did not yet perceive was the re-usability for the other clusters since we could not create all resources that we needed. But, thankfully, while we were working on our workflow setup, a new version of the provider was released which added the resource “kubernetes_manifest”. This basically removed any constraint we had as it added the ability to define custom resources and thus also our Openshift specific manifests.

Initially we added manifest definitions for user-groups, security constraints and Openshift specific networking. Unfortunately, on subsequent runs of these resources we were getting quite a lot of errors and unwanted planned changes. These errors were to be expected as this resource was still not generally available but we could resolve most of the issues by adding a life cycle argument “ignore_changes”. Fortunately, this was only temporarily needed as subsequent bug fixes for the manifest resource solved most of our issues. Since the GA release of the manifest feature we haven’t had any major issue applying custom manifests at all.

Future ideas

The amazing thing to notice in this process is that this initial green fielding led to a complete change of mind for all team members on how we want to manage our Openshift clusters. So much that we are now even adding labels to our resources which prevent the possibility to make manual changes.

At this point we manage about 80% of our cluster resources from the pipeline as we did not yet have the time to integrate the remaining 20% manually applied resources. Since it is now possible to add all custom resources we do plan to add those to our configuration over time as well.

Now that we can include all of the required resources in our code, also the last point of our initial expected benefit list can be reached, re-usability for other clusters. But in order to do so effectively, we will first take our time rewriting the current setup into reusable modules. Perhaps we will even take the time to share those experiences in a new story.

Until next time!

Managing Openshift clusters with Terraform

Introduction

Starting out

Thank you updates

Future ideas

Written by Maxvandermeij