HandsFree mTLS in Kubernetes

Tyler Wanner
Oct 13 · 7 min read

Mutual TLS involves validating both client and server certificates during connection handshakes. This in-transit encryption is a key part of a zero trust framework, mitigating risks such as man-in-the-middle attacks and replay attacks. Linkerd provides a lightweight, fastest-in-class, easy-to-deploy service mesh that provides mTLS out of the box, along with other security, observability, and performance lifts. Using Helm to provision Linkerd service mesh and cert-manager, all CNCF projects like Kubernetes, we can accomplish this as a layer of configuration without concerning application development. For more on mTLS, visit here

Infrastructure as Code

Declarative vs Imperative

Declarative infrastructure code is the art of writing infrastructure “as it should be,” whereas imperative infrastructure code is about defining the steps to get to the desired state. Terraform is a widely-used declarative infrastructure code tool with incredibly powerful dynamic configuration and change workflow. It can detect and explain configuration changes, and run them atomically, with special attention to dependencies.

Getting Started

  • Terraform
  • GCP credentials with permission to create a project
  • gcloud CLI

This demo has been tested with:

Terraform v1.0.6

google cloud provider v3.85.0

gcloud CLI v288.0.0

You can use an environment variable to set which service account key to use during provisioning:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key 

Otherwise, it will use your active gcloud config. Check your configured gcloud config with gcloud auth list

Getting Started With Terraform

mkdir states cluster cert_manager issuers linkerd

In the cluster directory, we’ll create our state file in backend.tf as follows:

terraform {
backend "local" {
path = "../states/cluster.tfstate"
}
}

This will instruct terraform to keep the state file in the states/ directory. In these state files, terraform will hold important information like the cluster endpoint. In other components, we’ll use a data source called terraform_remote_state which will provide a reference to our outputs.

The separation of files in terraform does not generally matter — all .tf files in the directory will be merged by terraform unless otherwise specified — feel free to leave everything in one file if that is your cup of tea, but we’ve broken things up here for clarity.

We’ll then declare, in variables.tf, two variables.

variable "zone" {}
variable "project" {}

Variables expect Strings by default, and can take default values here or we can define them in any file that ends in .auto.tfvars, as we will do in cluster.auto.tfvars (feel free to use any zone you wish).

zone = "us-east1-b"

As of terraform 0.13, terraform requires defining provider and terraform versions, which we’ll do in versions.tf

terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "3.85.0"
}
}
required_version = ">= 1"
}

We then will create a terraform.tf file where we declare any outputs and provider configuration (again, this is just a personal convention, organize your code however it makes sense for you, all .tf files are equal). For now, we’ll just add the providers since we don’t know which outputs we need yet.

provider google {
project = var.project
zone = var.zone
}

Now any google resources we create in this component will default to using the project and zone specified in your variables, and terraform will know to use version 3.42 or higher of the google terraform provider. If you don’t define them in a .auto.tfvars file, terraform will ask you interactively for them when you go to make changes.

OK, Let’s Get Provisioning Already!

Traverse into the cluster directory and initialize your component with terraform init — this will create your state file, download the providers you need, etc. Now we can run terraform apply

An example of the diff generated by Terraform when running terraform apply
An example of the diff generated by Terraform when running terraform apply
Terraform will render this diff and ask for a `yes` to proceed

Terraform will ask for approval — type ‘yes’ or use terraform apply -auto-approve. When successful, terraform will give us our outputs at the bottom. You won’t need to write them down, as we’ll reference them dynamically from here.

We’re going to do the same thing for the rest of the components —next, we’ll apply cert-manager, then we’ll create our cert-manager resources, and, finally, the linkerd service mesh control plane. As we go along, I’ll explain each component’s role in this stack.

Helm

Cert-Manager

Below, in our next gist, we define the helm release with a name, a namespace to put it in, and a reference to the chart location, which is hosted by JetStack. Any custom values we need to supply, we can add to values.yaml.

Here’s the fun part — when authenticating with the cluster, we’ll use the cluster’s generated CA Certificate and a Google-generated auth token from our gcloud profile to authenticate to the Host, both of which dynamically reference the outputs from the cluster component. We can (but won’t, in this demo) use terraform workspaces then to manage independent stacks, so that your dev Helm Release will know to release in your dev cluster!

CRDs

Be mindful that when you install CRDs using Helm, if you delete your release, all of your CRD objects such as signed Certificates will be deleted as well. To get around this, you may want to follow Option 1 here.

Cert-Manager CRDs

Terraform has recently made headway in supporting raw yaml manifests in Terraform configuration. We’ll enable the experimental feature “manifest_resource” in our Kubernetes provider and use the new `kubernetes_manifest` resource to apply raw yaml as hcl for the Certificate and Issuer directly.

This is how we’ll create a Certificate resource, which cert-manager will reference to create and rotate a tls secret that will serve as the issuing CA for linkerd’s identity service. Cert-manager will use our trust anchor secret when generating that certificate. So, the full chain of the certificate presented by the linkerd proxies will be some leaf generated by linkerd-identity, from this new Certificate CA, from the original trust anchor. Certs are fun, I know.

The Linkerd Control Plane

We need to tell the Helm release about the trust anchor PEM and issuer, and supply a set of standard HA values.

The values-ha.yaml file is pulled from the public helm chart and is used to add more replicas to the critical components, resources configs, and anti-affinity to spread the pods across nodes. See note from the helm chart

Now What?

From here, you might want to add trace header propagation for your distributed tracing, add Flagger to control Linkerd’s routing (i.e. for canary deploys), etc. The folks at Buoyant have also recently took Buoyant Cloud to General Availability, an observability and governance tool for your fully meshed network, across clusters, etc.

Congratulations to the Linkerd team for achieving graduation from the CNCF!

The Prefect Blog

The easiest way to automate your data