AKS with Cert Manager

Using cert-manager add-on with AKS

Joaquín Menchaca (智裕)
Geek Culture
Published in
12 min readJun 28, 2021

--

This article details how to secure web traffic using TLS with a certificate from a trusted CA and a public domain. This will use Let’s Encrypt through a popular Kubernetes add-on cert-manager.

In this article we’ll use the following components:

⚠️ WARNING: This uses the kublet identity, an identity or principal that is assigned to all nodes in the cluster, to access Azure DNS. This means that all containers on the cluster will have access to read/write DNS records for the domain. While this may be fine for limited test environments, this SHOULD NEVER BE USED IN PRODUCTION. This violates the principal of least privilege. Alternatives would configuring access are AAD Pod Identity or the more recent Workload Identity.

Overview of cert-manager

One common scenario for securing web applications or services, it to have encrypted traffic with TLS certificates, where the encryption will be terminated at the load balancer. Before the arrival of Kubernetes, nginx was a popular solution for this process.

On the Kubernetes platform, the ingress resource through the ingress-nginx controller will perform the TLS termination. This allows secure communication from the user to the endpoint, while everything behind the endpoint is not encrypted allowing for easier configuration, monitoring, and debugging.

cert-manager process

The cert-manager addon will monitor ingress events, and then install or update a certificate for use with encryption of web traffic. This happens in the following steps:

  1. cert-manager submits an order to an ACME CA managed by Let’s Encrypt.
  2. a challenge (DNS01) is made to read/write a DNS record on Azure DNS to demonstrate the user owns the domain.
  3. When the challenge is satisfied, a certificate is then issued by an ACME CA.
  4. cert-manager will then create the secret with the certificate, which will be used by the ingress controller to secure web traffic.

Articles in the series

These articles are part of a series, and below is a list of articles in the series.

  1. AKS with external-dns: service with LoadBalancer type
  2. AKS with ingress-nginx: ingress (HTTP)
  3. AKS with cert-manager: ingress (HTTPS)
  4. AKS with GRPC and ingress-nginx: ingress (GRPC and HTTPS)

Previous Articles

AKS + ingress-ginx + external-dns: In the previous article I covered how to deploy ingress-nginx along with external-dns:

AKS + external-dns: In the first article in this series, I covered how to deploy external-dns for use with service of LoadBalancer type.

Requirements

These are some logistical and tool requirements for this article:

Registered domain name

When securing web traffic with TLS certificates that are trusted (or in other words, a certificate issued by a trusted CA), you will need to own a public domain name, which can be purchased from a provider for about $2 to $20 per year.

A fictional domain of example.com will be used as an example. Thus depending on the examples used, there would be, for example, hello.example.com, ratel.example.com, and alpha.example.com.

Required tools

These tools are required for this article:

  • Azure CLI tool (az): command line tool that interacts with Azure API
  • Kubernetes client tool (kubectl): command line tool that interacts with Kubernetes API
  • Helm (helm): command line tool for “templating and sharing Kubernetes manifests” that are bundled as Helm chart packages.
  • helm-diff plugin: allows you to see the changes made with helm or helmfile before applying the changes.
  • Helmfile (helmfile): command line tool that uses a “declarative specification for deploying Helm charts across many environments”.

Optional tools

I highly recommend these tools:

  • POSIX shell (sh) such as GNU Bash (bash) or Zsh (zsh): these scripts in this guide were tested using either of these shells on macOS and Ubuntu Linux.
  • curl (curl): tool to interact with web services from the command line.
  • jq (jq): a JSON processor tool that can transform and extract objects from JSON, as well as providing colorized JSON output greater readability.

Project setup

As this project has a few moving parts (Azure DNS, AKS, cert-manager, external-dns, ingress-nginx) with example applications Dgraph and hello-kubernetes, these next few will help keep things consistent.

Project file structure

The following structure will be used:

~/azure_cert_manager/
├── env.sh
├── examples
│ ├── dgraph
│ │ └── helmfile.yaml
│ └── hello
│ └── helmfile.yaml
└── helmfile.yaml

With either Bash or Zsh, you can create the file structure with the following commands:

mkdir -p ~/azure_cert_manager/examples/{dgraph,hello} && \
cd ~/azure_cert_manager

touch \
env.sh \
helmfile.yaml \
./examples/{dgraph,hello}/helmfile.yaml

These instructions from this point will assume that you are in the ~/azure_ingress_nginx directory, so when in doubt:

cd ~/azure_cert_manager

Project environment variables

Setup these environment variables below to keep things consistent amongst a variety of tools: helm, helmfile, kubectl, jq, az.

If you are using a POSIX shell, you can save these into a script and source that script whenever needed. Copy this source script and save as env.sh:

Azure components

Below are the Azure specific configuration that is required. You can use this below, material from previous articles, or resources you provisioned with your own automation.

If you are provisioning Azure cloud resources using your own automation, you will need to keep these requirements in mind:

Resource group

In Azure, resources are organized under resource groups.

source env.shaz group create \
--resource-group
${AZ_RESOURCE_GROUP} \
--location ${AZ_LOCATION}

Cloud resources

For simplicity, you can create the resources needed for this project with the following:

source env.shaz network dns zone create \
--resource-group
${AZ_RESOURCE_GROUP} \
--name ${AZ_CLUSTER_NAME}az aks create \
--resource-group ${AZ_RESOURCE_GROUP} \
--name ${AZ_CLUSTER_NAME} \
--generate-ssh-keys \
--vm-set-type VirtualMachineScaleSets \
--node-vm-size ${AZ_VM_SIZE:-Standard_DS2_v2} \
--load-balancer-sku standard \
--enable-managed-identity \
--node-count 3 \
--zones 1 2 3
az aks get-credentials \
--resource-group ${AZ_RESOURCE_GROUP} \
--name ${AZ_CLUSTER_NAME} \
--file ${KUBECONFIG:-$HOME/.kube/config}

You will need to transfer domain management to Azure DNS for root domain like example.com, or if you are using sub-domain like dev.example.com, you’ll need to update DNS namespace records to point to Azure DNS name servers. This process is fully detailed as well as how to provision the equivalent with Terraform in Azure Linux VM with DNS article.

For a more robust script on provisioning Azure Kubernetes Service, see Azure Kubernetes Service: Provision an AKS Kubernetes Cluster with Azure CLI article.

Authorizing access Azure DNS

We need to allow access to the Managed Identity installed on VMSS node pool workers to the Azure DNS zone. This will allow any pod running a Kubernetes worker node to access the Azure DNS zone.

NOTE: A Managed Identity is a wrapper around service principals to make management simpler. Essentially, they are mapped to a Azure resource, so that when the Azure resource no longer exists, the associated service principal will be removed.

Managed Identity authorized to access Azure DNS

Run these commands below to extract the scope and service principal object-id and grant access using these commands:

source env.shexport AZ_DNS_SCOPE=$(
az network dns zone list \
--query "[?name=='$AZ_DNS_DOMAIN'].id" \
--output
tsv
)
export AZ_PRINCIPAL_ID=$(
az aks show \
--resource-group $AZ_RESOURCE_GROUP \
--name $AZ_CLUSTER_NAME
\
--query "identityProfile.kubeletidentity.objectId" \
--output tsv
)
az role assignment create \
--assignee "$AZ_PRINCIPAL_ID" \
--role "DNS Zone Contributor" \
--scope "$AZ_DNS_SCOPE"

Kubernetes components

The Kubernetes add-ons can be installed with the following script below.

Install cert-manager

Copy this script below and save as helmfile.yaml:

Once ready, simply run:

source env.shhelmfile apply

Install cert-manager clusterissuers

Copy the following and save as issuers.yaml:

There will need to be a few seconds before the cert-manager pods are ready and online. When ready, run this:

export ACME_ISSUER_EMAIL="<your-email-goes-here>"
source
env.sh
helmfile --file issuers.yaml apply

Kubernetes example: hello-kubernetes

The hello-kubernetes is a simple application that prints out the pod names. The helmfile script below, which is simply raw Kubernetes manifests folded into a helm chart so that we can use dynamic values, will do the following:

  • deploy a Deployment to manage 3 pods
  • deploy a Service that points to the pods
  • deploy an Ingress (using ingress-nginx) that configures a route to direct traffic to the Service. using a FQDN hostname, e.g. hello.example.com.

The ingress resource will also do the following magical automation:

  • instruct external-dns to set up a record in Azure DNS zone, e.g. hello.example.com.
  • issue a certificate to secure traffic using the issuer specified through cert-manager.
Example: hello-kubernetes

Copy the file below and save as examples/hello/helmfile.yaml:

Deploy with the staging issuer

For staging to test the functionality, run the following

source env.sh
export ACME_ISSUER=letsencrypt-staging
helmfile --file ./examples/hello/helmfile.yaml apply

Then verify the resources were deployed:

kubectl get all,ing,certificate --namespace hello

This should look something like this:

hello-kubernetes deploy

You can look at the events to see that the certificate was issued successfully or if any issues occurred:

kubectl describe ingress --namespace hello
kubectl describe certificate --namespace hello

When all resources are ready, test the solution with (substituting example.com for your domain):

curl --insecure --silent --include https://hello.${AZ_DNS_DOMAIN}

NOTE: For staging environment, a certificate from an untrusted private CA will be used, so the argument --insecure or -k is needed.

When completed, remove the existing solution:

helm delete hello-kubernetes --namespace hello

Deploy with the prod issuer

When satisfied the solution is working, we can try production issuer. The reason why we do this is in two phases, is because ACME servers have an extreme limit on requests.

Deploy using the production issuer with the following:

source env.sh
export ACME_ISSUER=letsencrypt-prod
helmfile --file ./examples/hello/helmfile.yaml apply

After a few moments, you can check the results https://hello.example.com (substituting example.com for your domain).

Kubernetes example: Dgraph

Dgraph is a distributed graph database and has a helm chart that can be used to install Dgraph into a Kubernetes cluster. You can use either helmfile or helm methods to install Dgraph.

What is nice about this example is that it will deploy two endpoints through a single Ingress: one for the Dgraph Ratel graphical user interface client (React) and the database service itself Dgraph Alpha.

Example: Dgraph Alpha + Dgraph Ratel

Securing Dgraph

In a production scenario, public endpoints should be secured, especially a backend database, but for the keep things simple for demonstration, the endpoint will not be secured.

We can add some level of security on the Dgraph Alpha service itself by adding an allow list (also called a whitelist):

# get AKS pod and service IP addresses
DG_ALLOW_LIST
=$(az aks show \
--name $AZ_CLUSTER_NAME \
--resource-group $AZ_RESOURCE_GROUP | \
jq -r '.networkProfile.podCidr,.networkProfile.serviceCidr' | \
tr '\n' ','
)
# append home office IP address
MY_IP_ADDRESS=$(curl --silent ifconfig.me)
DG_ALLOW_LIST="${DG_ALLOW_LIST}${MY_IP_ADDRESS}/32"
export DG_ALLOW_LIST

Deploy Dgraph

Copy the file below and save as examples/dgraph/helmfile.yaml:

In a similar fashion, we can test the solution in staging first, then try production.

Deploy with the staging issuer

For staging to test the functionality, run the following

source env.sh
export ACME_ISSUER=letsencrypt-staging
helmfile --file ./examples/dgraph/helmfile.yaml apply

Verify that the Dgraph services are all in a running state. This may take about a minute:

kubectl get all,ing,certificate --namespace dgraph

This should show something like the following:

Dgraph Deployment with Certificate

Verify that the Dgraph Alpha is accessible by the domain name (substituting example.com for your domain):

curl --insecure --silent https://alpha.${AZ_DNS_DOMAIN}/health | jq

Deploy with the prod issuer

For prod to test the functionality, run the following.

source env.sh
export ACME_ISSUER=letsencrypt-prod
helmfile --file ./examples/dgraph/helmfile.yaml apply

Verify the certificate was updated with:

kubectl describe ingress --namespace dgraph

You should see an updated Certificate message:

Ingress resource events

You can also see the certificate events with:

kubectl describe certificate --namespace dgraph

This should show events similar to this:

Certificate resource events

Verify that the Dgraph Alpha is accessible by the domain name (substituting example.com for your domain):

curl --silent https://alpha.${AZ_DNS_DOMAIN}/health | jq

NOTE: Now that we are using public trusted certificates, instead of the private certificates in staging,--insecure (or -k) is no longer needed.

Upload Data and Schema

There are some scripts adapted from tutorials https://dgraph.io/docs/get-started/ that you can down:

PREFIX=gist.githubusercontent.com/darkn3rd
RDF_GIST_ID=398606fbe7c8c2a8ad4c0b12926e7774
RDF_FILE=e90e1e672c206c16e36ccfdaeb4bd55a84c15318/sw.rdf
SCHEMA_GIST_ID=b712bbc52f65c68a5303c74fd08a3214
SCHEMA_FILE=b4933d2b286aed6e9c32decae36f31c9205c45ba/sw.schema
curl -sO https://$PREFIX/$RDF_GIST_ID/raw/$RDF_FILE
curl
-sO https://$PREFIX/$SCHEMA_GIST_ID/raw/$SCHEMA_FILE

Once downloaded, you can then upload the schema and data with:

curl -s "https://alpha.$AZ_DNS_DOMAIN/mutate?commitNow=true" \
--request POST \
--header "Content-Type: application/rdf" \
--data-binary @sw.rdf | jq
curl -s "https://alpha.$AZ_DNS_DOMAIN/alter" \
--request POST \
--data-binary @sw.schema | jq

Connect to Ratel UI

After a few moments, you can check the results https://ratel.example.com (substituting example.com for your domain).

In the dialog for Dgraph Server Connection, configure the domain, e.g. https://alpha.example.com (substituting example.com for your domain)

Test Using the Ratel UI

In the Ratel UI, paste the following query and click run:

{
me(func: allofterms(name, "Star Wars"),
orderasc: release_date)
@filter(ge(release_date, "1980")) {
name
release_date
revenue
running_time
director {
name
}
starring (orderasc: name) {
name
}
}
}

You should see something like this:

Cleanup the Project

You can cleanup resources that can incur costs with the following:

Remove External Disks

Before deleting AKS cluster, make sure any disks that were used are removed, otherwise, these will be left behind an incur costs.

############ 
# Delete the Dgraph cluster
############################################

helm
delete demo --namespace dgraph
############
# Delete external storage used by the Dgraph cluster
############################################
kubectl delete pvc --namespace dgraph --selector release=demo

NOTE: These resources cannot be deleted if they are in use. Make sure that the resources that use the PVC resources were deleted, i.e. helm delete demo --namespace dgraph .

Remove the Azure Resources

This will remove the Azure resources:

############ 
# Delete the AKS cluster
############################################

az
aks delete \
--resource-group $AZ_RESOURCE_GROUP \
--name $AZ_CLUSTER_NAME
############
# Delete the Azure DNS Zone
############################################

az
network dns zone delete \
--resource-group $AZ_RESOURCE_GROUP \
--name $AZ_DNS_DOMAIN

Resources

Here are some resources I have come across in development of this article.

Blog Source Code

Articles

Helm Chart (cert-manager)

Documentation (cert-manager)

Example Implementations

Conclusion

The main focus of this article was to secure a public facing website or application that has a web interface using Let’s Encrypt with an ACME issuer. A part of this journey included installing the required ingress controller with ingress-nginx (OpenResty under the hood) and automation for DNS record updates with external-dns.

One last note on security, all pods running on the cluster will have the ability to update records in the Azure DNS zone as well as issue certificates, which validates using Azure DNS zone (for the DNS01 challenge) as well. You can secure these further using aad-pod-identity, so that only the pod has the appropriate credentials, allowing to apply the principle of least privilege. Note that this feature is currently a preview release for integration with AKS.

With cert-manager, there are so many configuration options. With the ACME issuer used in this article, the DNS01 challenge used Azure DNS, but you are by no means limited to this one, as Route53, Cloud DNS, and CloudFlare amongst other are supported, or alternatively use the HTTP01 challenge. Besides ACME, you could try out others CAs, such as Vault, Venafi, CloudFlare origin-ca-issuer, FreeIPA and others. The possibilities are endless.

For this above and other reasons, it is no a surprise that cert-manager is by far the most popular solution on Kubernetes for certificate management.

--

--

Joaquín Menchaca (智裕)
Geek Culture

DevOps/SRE/PlatformEng — k8s, o11y, vault, terraform, ansible