Design your Landing Zone — Design Considerations Part 4— IaC, GitOps and CI/CD (Google Cloud Adoption Series)

Dazbo (Darren Lester)
Google Cloud - Community
10 min readApr 1, 2024

Welcome to LZ Design Considerations Part 4, where we’ll wrap up the Landing Zone Design Considerations. This is part of the Google Cloud Adoption and Migration: From Strategy to Operation series.

Previously, we covered LZ design decisions relating to monitoring, logging, billing and labelling. In this part, we’ll focus on all things related to automated infra and application deployment, using IaC, GitOps, and CI/CD.

12. IaC, GitOps and CI/CD Strategy

Darren’s quote of the day

Principles Recap

I recommended a number of principles in a previous article, called Cloud Adoption and Cloud Consumption Principles. I want to recap a couple here:

  • Automate deployments and installs
  • Immutable infrastructure

Let me review the rationale for these. In our legacy on-prem world, we had:

  • Fewer, larger machines.
  • Machines tended to built once, and rarely rebuilt.
  • VMs needed to be looked after. They were treated as pets.
  • We had limited scale and limited elasticity.
  • We had to care about the underlying physical hardware.

But now, in Cloud:

  • We have many, smaller machines.
  • VMs, applications and services tend to be rebuilt frequently. It’s more effective to replace poorly services, than to try to fix them. We treat them as cattle.
  • Applications are designed to be fault-tolerant.
  • We have virtually unlimited scale, and most services are extremely elastic.
  • We want services to scale down (or be turned off) when not in use.
  • For the most part, we don’t care about the underlying hardware.

So, we want our LZ to align to these principles. We need an automated, repeatable way to build immutable, replaceable infrastructure.

Infrastructure-as-Code (IaC) to the Rescue

IaC, creating infra resources in the Cloud

IaC is about automated provisioning of infrastructure resources, using code. It allows us to rapidly provision (and tear down) infrastructure environments in a repeatable, consistent way.

Some key tenets of IaC:

  • All the infrastructure provisioning and dependencies are defined in code.
  • The code should be stored in source control, such as GitHub. This way it is managed, versioned, and supports collaboration.
  • Our code can be imperative — i.e. follow a set of steps to achieve the outcome. Or our code can be declarative — i.e. “here’s the outcome I need, now you work out how to do it.” Declarative is best!
  • It is easily deployed as part of an automated CI/CD pipeline.
  • It is idempotent — meaning that we can always repeat our deployment, regardless of the current state, and end up with the state that we wanted.
  • We can use it to build multiple environments, and we can be sure they look the same. We can always pass in parameters, to apply environment-specific configuration.
  • We can use it to deploy DR environments on-demand. (If that is our chosen DR strategy.)
  • The code is self-documenting. (And supports comments.) This means that an infrastructure engineer can look at our IaC, and understand what it will do. As such, it provides the implementation of our high level design, and eliminates the need for a significant amount of low level design documentation. Why? Because the IaC is the low level design documentation, for the cloud infrastructure.
  • It eliminates configuration drift — not only between the HLD and the deployed environments, but between the environments themselves.
  • It eliminates human errors. We don’t have human operators building stuff manually, or tweaking stuff in individual environments.

Some Tips and Best Practices for IaC

Here’s the thing… If you’re not using IaC, you’re doing Cloud wrong. So here are a few key takeaways:

  • For initial resource deployment in a Dev environment, you can provision resources using the Google Cloud Console. But when you build any cloud infra resources in any other environments (e.g. UAT, OAT, Staging, Prod, whatever), your should be doing so with IaC. This way, you’re using the same code to deploy to all environments.
  • Ensure that your LZ project factory provides service accounts to your tenants, and use those service accounts to actually deploy the resources.
  • Don’t allow manual (human) infrastructure tweaking in any environments other than Dev. You can enforce this through policy and IAM. If you allow operators to tweak configuration by hand then you will get configuration drift, and you’ll kill your automation and repeatability benefit.
  • If you have any engineers or integrators that say things like, “Let’s just build it manually, and worry about the IaC later” then educate them, or get rid of them. This sort of legacy on-prem thinking kills your cloud agility. I’ve been on projects where system integrators have refused to build the IaC upfront. And the detrimental impact on the project is staggering!
  • Create a CI/CD pipeline to automate your IaC deployments.

The GitOps Approach

Building our infrastructure using IaC is a good start. But we need to build a CI/CD pipeline, such that IaC changes can be automatically deployed to our target environment. Google advocates for the use of GitOps, which requires that:

  • All resources are deployed using declarative code.
  • Our code is hosted in a Git repository.
  • All operational changes are made by developers who make a pull request.
  • Merging of the pull request results in execution of a build and release pipeline.

Here’s a sketch of the overall GitOps pipeline, along with some products and tools you might use at each stage:

GitOps Pipeline

In the context of deploying cloud infrastructure, our declarative code will be in the form of IaC. Google’s documentation shows this reference example:

Google’s reference architecture for GitOps

In this example:

  • Our IaC is written in Terraform. Terraform is an open source cloud-agnostic IaC tool that uses a declarative IaC language.
  • We use GitHub to store our Git repo. (But we could use other Git hosting services like GitLab or BitBucket. If we want to stay fully within the Google Cloud ecosystem, we can also use Google Cloud Source Repositories, CSR. We can use CSR to host our master Git repos; but we could also synchronise our CSR repos from an upstream repo, like GitHub.)
  • Infra developers push IaC changes into a feature branch, triggering Google Cloud Build to execute terraform plan. This results in a Terraform manifest, but does not actually apply it. (We could use an alternative tool for executing terraform. For example, if we’re using GitHub for our Git repos, we could use GitHub Actions to trigger terraform.)
  • Then the developer raises a pull request for the dev branch. When it is merged, Cloud Build executes terraform apply, thus deploying our Terraform manifest to the dev environment.
  • Once the dev build has been validated, the changes are merged into prod, causing the Terraform manifest to be deployed to the prod environment.

Pipeline Layers

Google recommends separate pipeline layers, with different teams responsible for each. For example:

Pipeline layers

Here:

  • The foundation pipeline deploys the foundation resources that make up the LZ. This pipeline will typically be the responsibility of a single Platform Team.
  • The infrastructure pipeline deploys infrastructure that is used by individual tenants and applications. The pipeline can only be executed by a tenant service account, and this service account can only deploy to resources under this tenant’s folder. Google has example code for creating an application infrastructure pipeline in the GitHub Terraform Example Foundation repo, here.
  • The application pipeline deploys application resources, such as images, and GKE application resources.

Summary of IaC and GitOps Design Decisions

  • Which IaC tool? I would recommend Terraform, unless you have a compelling reason not to. Terraform is declarative, open source, is cloud agnostic, and works in the enterprise. It is also Google’s recommended tool for IaC with Google Cloud.
  • Assuming you’re using Terraform, where will you persist your Terraform state? In the enterprise, you should be storing Terraform state in a remote backend that supports collaborative working, automatic state locking, and granular access control. In the Google Cloud ecosystem, Google GCS is a great choice. But other options include Terraform Cloud (Terraform SaaS), and Terraform Enterprise (self-hosted).
  • Which source code platform? E.g. GitHub, GitLab, Google Cloud Source Repos, etc.
  • What will be your git branching strategy? Google recommends having a protected main branch, feature and bug-fix branches, plus a separate persistent branch for each environment. This way, changes can be promoted through the environments by merging the changes between the environment branches.
  • How will you separate these environments in your IaC repo? Google recommends using a separate folder in the repo for each environment. Each folder will map to a branch, and each branch will deploy to a specific environment.
Separating environments in your repo and in branches
  • How many environments will you manage? E.g. dev, uat, staging and prod?
  • What Google Cloud resource naming conventions will you adopt? You need to document your naming standards. But before you invent your own, Google has a set of recommended naming conventions here.
Google’s recommended resource naming conventions
  • Will you use IaC to deploy your landing zone? If so, will you use an existing IaC LZ blueprint, or create your own? Google provides a couple of open source organisation LZ blueprints and implementations which can be used to rapidly accelerate your LZ deployment. These are Google Cloud Foundation Fabric FAST and Cloud Foundation Toolkit (CFT) Terraform Example Foundation. Google Cloud Foundation Fabric FAST is intended to be a pre-composed end-to-end example, which is forked, cloned and modified as required. Whereas CFT is intended to be used a library of opinionated Terraform modules which should be composed as required. Google describes the differences between these two approaches here.
Google’s Enterprise LZ IaC Accelerator — Google Cloud Foundation Fabric FAST
  • How will you organise and manage access to IaC repos? Google recommends using the design principle that configurations with different approval and management requirements are separated into different source control repositories. For example, a central platform team may be responsible for the LZ IaC, shared resources, and the tenant factory. Whereas application teams may be responsible for all infrastructure resources deployed within their own folders. This approach — making use of pipeline layers — is recommended in enterprises, as it delegates control to application teams.
  • What CI/CD tools will you use in your GitOps pipeline. For example, you might choose Google Cloud Build for seamless integration with Google Cloud, if Google Cloud is the only infrastructure target of your IaC. Alternatively, if you’re already using GitHub and want more cloud agnosticity, then GitHub Actions might be a good choice.
  • How will tenants execute their IaC? Best practice is to only allow tenants to execute IaC using provided tenant service accounts.
  • IaC standards and best practices? Establish and document your organisation’s IaC and Terraform standards and best practices. And don’t reinvent the wheel. Google has great guidance on this already.
  • How will you enforce IaC policies and standards? Since all your cloud infrastructure will be deployed using IaC, it is important to ensure that the IaC you execute adheres to your organisation’s policies. For example, you might want to prevent deployment of certain resources, enforce use of labels with a limited set of values, or enforce customer-managed encryption keys on GCS buckets. Consider using an automated policy validation tool, such as Hashicorp Sentinel (but only if you’re using a Terraform Cloud or Terraform Enterprise backend), Terratest (which is open source), or Google’s free gcloud terraform vet.
Policy validation with gcloud terraform vet

Wrap-Up

After four articles on the topic of Google Cloud LZ design considerations, I think you’ll agree that the design phase is not entirely trivial! There are a lot of considerations, and many implications of your choices!

In the next part, I’ll show you how to go about making informed decisions. I’ll guide you through the LZ design process, show you how to capture your decisions, and tell you how to get the help you need, so you don’t make any troublesome mistakes!

Before You Go

  • Please share this with anyone that you think will be interested. It might help them, and it really helps me!
  • Feel free to leave a comment 💬.
  • Follow and subscribe, so you don’t miss my content. Go to my Profile Page, and click on these icons:
Follow and Subscribe

Links

Series Navigation

--

--

Dazbo (Darren Lester)
Google Cloud - Community

Cloud Architect and moderate geek. Google Cloud evangelist. I love learning new things, but my brain is tiny. So when something goes in, something falls out!