Self-renewing Let’s Encrypt wildcard certificates in Kubernetes for internal domains: Part I — the DNS server

Published in

Infrastructure adventures

12 min readJun 2, 2018

I had been waiting for long for the wildcard SSL certificate support in Let’s Encrypt, and after some delay, it’s finally available since this spring. At my workplace, we managed to migrate 100% of the production sites and domains (100+) fully to domain-validated LE certificates, using the classical HTTP-based verification. They are renewing every 2 months, without any human intervention, no certificate orders, manual replacement in load balancers, etc.

Still, we came across a new problem: on our bare-metal Kubernetes cluster, the developers are running a bunch of applications and they want to reach some of these microservices on HTTPS, using the built-in K8s service discovery internal domain name, and not by creating a load balancer or any nice URL.

Everybody prefers the green lock in the browser status bar. Photo by Paulius Dragunas on Unsplash

You can use self-signed certificates, but then every automated script will complain, your browser will complain or you might even have a company browser policy deployed which prevents you to open any non-secure site. Same with running an internal, self-hosted Docker Registry: you need a valid, real SSL certificate if you don’t want hundreds of your colleagues to set an --insecure-registry flag in their local Docker client by hand.

Internal domains vs. publicly verified certificates?

In order to get a wildcard certificate, you must verify your domain ownership by creating a DNS TXT record. Here you have 2 problems:

You don’t want to expose all your internal domains to the Internet
You must re-verify every 3 months, on-demand, for each wildcard sub-zone!

Obviously, this is impossible by hand, so you must use some kind of a dynamically updated DNS API. Unfortunately, our DNS provider doesn’t support this, so we had to come up with an alternative idea:

Let’s deploy our own DNS API with nearly minimum effort, used for Let’s Encrypt validations.

The components

I’m going to use the following components:

PowerDNS: to provide a dynamic updatable DNS API
Dehydrated: a Let’s Encrypt client written in Bash
A PDNS API hook: to create the dynamic DNS records
Docker: to build every component as a portable Docker image
Kubernetes: to run the DNS server, the automated renewal script and the certificate deployments
Git: to store the issued certificates and set up pipelines.

Because we want to have valid, real certificates for internal domains, we are going to set up a fake DNS server in the public company domain, serving a completely empty sub-zone file on the Internet. We will create here the necessary TXT records to validate the internal domains, then delete it a few minutes later, therefore continuing to hide the internal domain structure from the public.

Let’s suppose, we have a production, company domain called megye.si. We use *.intranet.megye.si for internal services and we have a Kubernetes cluster installed under cluster-01.intranet.megye.si. Those familiar with K8s know that the internal service discovery works the following way:

<service name>.<namespace>.svc.cluster-01.intranet.megye.si

For example, if I have a frontend-main service in the QA namespace, it will be reachable on the internal network through the frontend-main.qa.svc.cluster-01.intranet.megye.si DNS record.

So I would like to ensure that all QA applications are reachable on HTTPS, without provisioning a load balancer for them with a real domain name. I am going to need a wildcard certificate of *.qa.svc.cluster-01.intranet.megye.si.

The whole beauty in this is that people from the Internet will only see that I have an NS record, a custom DNS server pointing to intranet.megye.si, and it doesn’t serve any records and doesn’t reply for any queries.

The goal

I want to be able to provision any number of wildcard certificates for any number of different internal namespaces, with zero effort and manual work, all hidden from the Internet.

So, let’s get to work! For easier reproducibility, and also because I wanted to play with Google Cloud, I am going to provision the entire stack on GCP. (You can get a $300 free trial which is more than enough to play.)

The original solution was done in an on-premises cluster, with Gitlab CI pipelines. The GCP solution will be a little bit less powerful, mostly due to the very basic capabilities of the Google Container Builder (the CI tool of Google).

Part I: The DNS server

I am going to install a PowerDNS server with the built-in DNS API enabled. I usually prefer to use the BIND backend because I like that file format for its convenience a lot, however the dynamic API updates are not supported with it. Therefore, you must use one of the SQL backends: MySQL, Postgres or SQLite. To decrease complexity, I chose SQLite, because anyway we only want to store the temporary TXT records for a few minutes only, then we will keep an empty zone file.

I start with provisioning the source code git repository for the Docker image of the DNS server:

gcloud source repos create powerdns-letsencrypt
gcloud source repos clone powerdns-letsencrypt

To be able to use the API, the key components will be the following:

# /etc/pdns/pdns.conf...
api=yes
# pass this at runtime as environment var
#api-key=$APIKEYwebserver=yes...launch=gsqlite3
gsqlite3-database=/etc/pdns/powerdns.sqlite3

I mentioned that we’re not going to use the BIND backend. Yet, we can still write the zone definition in it and then convert it directly to SQLite format by internal PDNS tools!

# /etc/pdns/zones/zones.confauth-nxdomain yes;
allow-query { any; };
allow-transfer { none; };
recursion yes;zone "intranet.megye.si" {
    type master;
    file "/etc/pdns/zones/db.intranet.megye.si";
};
# /etc/pdns/zones/db.intranet.megye.si$TTL 10
@ IN SOA  ns-letsencrypt.megye.si. monitoring.megye.si. (
      0 ; Serial: auto-updated
     1h ; Refresh
     1h ; Retry
     2h ; Expire
    10) ; Negative Cache TTL
;
; name servers - NS records
 IN NS  ns-letsencrypt.megye.si.

I chose a very short DNS TTL, to avoid caching issues. About the DNS record configurations itself, I will provide more details at the end of this article.

I will convert the zone definitions at the time of creating the Docker image. Therefore in my Dockerfile the relevant part will be:

RUN sqlite3 /etc/pdns/powerdns.sqlite3 < /etc/pdns/schema.sql && \
    zone2sql --named-conf=/etc/pdns/zones/zones.conf --gsqlite | sqlite3 /etc/pdns/powerdns.sqlite3

I have downloaded the empty SQL table schemas from the PowerDNS docs (schema.sql) and after importing them, I convert the BIND zone files into SQLite format. We’re ready to roll!

CI pipeline to build the Docker image

My next step will be to write an automated build pipeline to create a new Docker image each time I push a new commit. I’m using the Google Container Registry (gcr.io) to store my Docker images; the build is done by the Google Container Builder tool. It has 2 options to create a new image with a pipeline: use the defaults and just build the Dockerfile in the repository, or for more complex use-cases, you can specify a custom cloudbuild.yaml file:

steps:
- name: 'gcr.io/cloud-builders/docker'
  args: ['build', '-t', 'gcr.io/$PROJECT_ID/$REPO_NAME:$COMMIT_SHA', '.']
- name: 'gcr.io/cloud-builders/docker'
  args: ['tag', 'gcr.io/$PROJECT_ID/$REPO_NAME:$COMMIT_SHA', 'gcr.io/$PROJECT_ID/$REPO_NAME:$BRANCH_NAME']images: ['gcr.io/$PROJECT_ID/$REPO_NAME:$COMMIT_SHA', 'gcr.io/$PROJECT_ID/$REPO_NAME:$BRANCH_NAME']

I chose the second option, because by default you only push the git commit hash in the image tag and I prefer always 2 tags created for each Docker build:

<image name>:<git commit hash>
<image name>:<git branch name>

For example, I can setup my deployment manifests in Kubernetes, to always start the latest master branch image. Imagine someone asks you the following question:

— Hey, which version is running on prod?
— The master branch.
— Ooookay, but which master branch? The one from last week or from today?

For this, I will do the automated deployments based on the commit hash, but still keeping always the latest version named after the git branch.

This is the the one and only time when we need to touch the Google Cloud Console in the browser, because unfortunately it’s not possible yet to configure the build triggers from CLI at the moment. (It’s a beta product.)

I’m storing the git repositories at Google, but it’s possible to use GitHub or Bitbucket remote repositories mirrored here as well.

Selecting the source repository of the pipeline

The default is to build based on a Dockerfile. You can only push 1 tag for the image name.

The finished pipeline. You can trigger it manually or by pushing a new commit.

Build finished. The Dockerfile based pipeline only pushed the commit hash as the image tag.

I like to use labels by the Open Containers standards, to indicate inheritance between images, build time, maintainer and similar metadata. I was surprised to see that the Google Container Builder doesn’t support the --label flags in the Docker build CLI, only if it’s defined inside the Dockerfile:

Build failed. Looks like Google uses a customized ‘docker build’ command.

Deploying the DNS server in Kubernetes. Deploying Kubernetes!

So after the pipeline is done, time to start the DNS server and configure my production domain to delegate the intranet addresses to this new service.

Just to see some extra fun, I am going to create a new Kubernetes cluster in GCP. Google just released the regional type of their managed K8s service, the Google Kubernetes Engine, it’s generally available now. (Means you get an SLA on it.)

Previously, you could only choose zone-based clusters. So let’s say, you deploy a GKE cluster to europe-west3-a only. It starts 1 API server in the background. When you do a rolling upgrade of the cluster, the API is killed and you cannot read or write in it anything until the new version is started. The existing traffic is uninterrupted, but you cannot start new pods.

Now with the regional availability, you get 3 API servers running in parallel! You can do a rolling upgrade of the Kubernetes masters, the cluster stays fully operational and the upgrade process is completely transparent.

I’m going to provision the cluster using preemptible cheap nodes, with the latest 1.10 version of Kubernetes. Even though I’m specificing --num-nodes "1", in reality I’m going to have 3 nodes running, because it starts 1 in each availability zone.

gcloud beta container \
  clusters create "cluster-1" \
  --region "europe-west3" \
  --no-enable-basic-auth \
  --cluster-version "1.10.2-gke.3" \
  --machine-type "n1-standard-2" --image-type "COS" \
  --disk-type "pd-standard" --disk-size "32" \
  --preemptible \
  --num-nodes "1" \
  --enable-cloud-logging --enable-cloud-monitoring \
  --enable-ip-alias --network "default" --subnetwork "default" \
  --addons HorizontalPodAutoscaling,HttpLoadBalancing \
  --enable-autoupgrade --enable-autorepair --maintenance-window "04:00"

After the cluster is bootstrapped, it’s ready to accept new services in it. For example, a wonderful DNS API server! Let’s provision it in a new namespace, dedicated for infrastructure components:

kubectl create namespace infra# Don't forget some sensible resource limits + user permissions.
# kubectl apply -f resource-quota-infra.yaml
# kubectl apply -f rbac-infra.yaml

If you remember the PDNS config, we’re going to need an API key to access the PowerDNS web service. We could hardcode this password in the Docker image, but on one hand, it’s not secure and also, we’re going to need this key on the client side as well. So let’s make it fully dynamic from day 1!

# Let's create a secret object
kubectl create secret generic powerdns-letsencrypt-apikey \
  --from-literal=apikey=crepe-expel-epileptic-estimator-could-headsman \
  --namespace infra \
  --dry-run -o yaml > secret.yaml

I’m using here the --dry-run --output yaml flags together. This creates the YAML version of the secret definition and I can save it in my git repository. In this way, it will be very easy to reproduce it in a different namespace / another cluster and also, I have a backup!

There are 2 manifests left: the service definitions and the deployment itself!

apiVersion: v1
kind: Service
metadata:
  name: powerdns-letsencrypt-api
  namespace: infra
  labels:
    app: powerdns-letsencrypt
spec:
  # I want to assign an internal, virtual service IP to load balance between the pods
  type: ClusterIP# Here I use the same selector as in the deployment.
  selector:
    app: powerdns-letsencryptports:
  - protocol: TCP
    name: port-api
    port: 8081
    targetPort: 8081---apiVersion: v1
kind: Service
metadata:
  name: powerdns-letsencrypt-udp
  namespace: infra
  labels:
    app: powerdns-letsencrypt
spec:
  type: LoadBalancer# Here I use the same selector as in the deployment.
  selector:
    app: powerdns-letsencryptports:
  - protocol: UDP
    name: port-dns-udp
    port: 53
    targetPort: 53

You can see an interesting difference between the 2 service definitions. We create

1 for the internal API access (I don’t want it exposed to the Internet!)
1 for public Internet access, the DNS server itself

I have found here 2 interesting things with how GCP works: when you expose a service as a LoadBalancer type of service, it allocates you an external IP address and exposes your service as a NodePort on all of your running cluster nodes. It’s interesting that they don’t use BGP or any internal routing instead of NAT.

The other one is that on GCP you cannot put a UDP and TCP port together in the same service definition, if it’s publicly exposed. In bare-metal clusters you can easily do this.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: powerdns-letsencrypt
  namespace: infra
  labels:
    app: powerdns-letsencrypt
spec:
  replicas: 1
  selector:
    matchLabels:
      app: powerdns-letsencrypt  template:
    metadata:
        app: powerdns-letsencrypt# Here is the actual pod specification, same as in the single pod example
    spec:      containers:
      - name: pdns
        image: gcr.io/personal-201021/powerdns-letsencrypt:master
        imagePullPolicy: Always        env:
        - name: APIKEY
          valueFrom:
            secretKeyRef:
              name: powerdns-letsencrypt-apikey
              key: apikey        readinessProbe:
          httpGet:
            path: /
            port: port-api
          initialDelaySeconds: 3
          periodSeconds: 10
          timeoutSeconds: 3        ports:
          - containerPort: 8081
            name: "port-api"          - containerPort: 53
            protocol: TCP
            name: "port-dns-tcp"          - containerPort: 53
            protocol: UDP
            name: "port-dns-udp"        resources:
          requests:
            memory: "100Mi"
            cpu: "100m"
          limits:
            memory: "128Mi"
            cpu: "200m"

I highlighted the most important parts:

use the correct image name from the gcr.io registry
the API key from the secret object, as an environment variable
health check of the API, based on named ports: this is one my favourite features when there are 10+ ports defined in a pod, it’s very easy to see which service is used for the health check.

We do the deployments with kubectl apply, instead of kubectl create or replace, including the very first occasion. If you always use apply, you get automated version control for the applied YAML manifests and you can do easy rollbacks to any version in the history.

$ kubectl get svc -n infra -l app=powerdns-letsencrypt
NAME                       TYPE           CLUSTER-IP    EXTERNAL-IP    PORT(S)        AGE
powerdns-letsencrypt-api   ClusterIP      10.84.8.198   <none>         8081/TCP       3m
powerdns-letsencrypt-udp   LoadBalancer   10.84.6.156   35.234.66.63   53:30562/UDP   2m

We can see the DNS server is up and running. I can reach the internal API using the powerdns-letsencrypt-api.infra DNS name, and the DNS service itself on a public IP addresses.

DNS configuration

The only remaining part is to tell my DNS provider to delegate all of the intranet.megye.si subdomains to this new service. I use personally the afraid.org service because they offer the most options at the cheapest price. The GUI is nothing fancy, but I rarely need to touch it.

I am going to define two new records in my zone file:

an A record for the DNS server
and an NS record to delegate the sub-zone

The A record to point to my public facing DNS server

The NS record itself will be the only publicly shown part of my intranet.

Let’s verify the DNS settings:

$ dig intranet.megye.si...;; AUTHORITY SECTION:
intranet.megye.si. 10 IN SOA ns-letsencrypt.megye.si. monitoring.megye.si. 0 3600 3600 7200 10...

And we’re done! In the next part I’m going to explain how to provision the wildcard certificates. First we will create an auto deploy pipeline for the DNS server image, because I don’t want to deploy the new Docker image by hand each time I push a new commit.

Then we will build the Let’s Encrypt client and configure it to automatically provision the wildcard certificates, save them to a git repository and then a CI pipeline will deploy them to the cluster as a Secret object, ready to be consumed by the HTTPS-aware applications.

Chapter II available: https://medium.com/infrastructure-adventures/self-renewing-lets-encrypt-wildcard-certificates-in-k8s-for-internal-domains-part-ii-certs-66c613a2279f

Hope you enjoyed this tutorial! Let me know in the comments below if something is not clear enough or you have a better approach for some parts!

You can find the source code here: https://github.com/dmegyesi/letsencrypt-wildcard-dnsapi