How we avoid SSL Certificate expiry

Automating SSL Certificate Renewals using Azure DNS, Azure DevOps and acme.sh

Mitchell Homer
Accurx
6 min readMar 3, 2022

--

Have you ever been haunted by a “Your connection is not private” webpage? 😱

It’s annoying enough as a user, so imagine having to scramble and sift through documentation to fix this, making production changes, while your user base gets progressively more frustrated. Not much fun.

I’m Mitch, an engineer in Accurx’s DevOps team and in this blog, I’ll run through how we tackled this problem, using some standard DevOps & Azure Infrastructure tooling.

Quick heads up, things will get engineering focused from now 😄. So if words like Azure DevOps and KeyVault make you scratch your head, hop over to our recent posts on career progression and team-building instead!

The task: automating certificate renewals

In the past, our web domain certificates were created manually via CA (Certificate Authority) web portals and copied to our one-size-fits-all Azure KeyVault. But in late 2021, we decided it was time to automate things for lots of organisational reasons… and a couple of selfish ones too:

Organisational reasons:

  • As a health-tech company, our data and privacy concerns are really important for our patient-facing features. Providing and supporting up-to-date SSL certificates for our domains and services helps us show the integrity and trustworthiness of Accurx.

Selfish reasons:

  • We were scared of manual configuration and found renewing certificates every 3–12 months really time-consuming. For one, it could be a pain maintaining step-by-step documentation of renewal and integration processes. And as Accurx scales up, creating new certificates for other services and domains could be pretty laborious.

Kicking things off: TXT record validation and certbot

We had three key aims: to create an automated renewal pipeline that (1) avoided certificate expiration, (2) leveraged standardised tools and (3) provided separation by infrastructure environment.

To harness the Azure CLI effectively in an automated manner, we needed to do some groundwork first. This would let us move all of our DNS (Domain Name System) configuration from our existing provider to Azure DNS Zones via CNAME record mapping. Only by doing that, could we capture the certificate validation steps of a renewal process into a ‘byte-sized’ bash script.

For context, CA’s require domain ownership validation to prove that you own or manage the domains you are requesting a certificate for. The method we opted for is TXT record updates for easier management via DNS.

This script was originally utilised with certbot, but could be passed relevant environment variables for use with other CLI tools such as acme.sh

If you’d like to test this script functionality with your Azure DNS Infrastructure, I’ve included the command for use with certbot using our preferred ‘dns’ validation method.

certbot certonly --manual --preferred-challenges=dns \
--manual-auth-hook ./azuredns_acme_record_updater.sh \
-d "$domain" --non-interactive --agree-tos -m "$tos_email" \
--csr “$csr_path --fullchain-path "$fullchain_path"

Thankfully, there was a major upside to moving to Azure DNS. We could use the Azure CLI for both the DNS record updates and managing our certificates in KeyVault. To the teams delight, we found the same “Service Principal” Azure Active Directory role could perform the required infrastructure changes via our existing Azure DevOps Pipelines without much more configuration.

You can find more on creating permissions for Azure DevOps Pipeline permissions for these types of operations here!

Using acme.sh for asynchronous domain validation:

Using our azuredns_acme_record_updater.sh script above alongside Azure’s infrastructure components handled the large majority of our domain requirements. But given we work closely with NHS bodies we also use some provided web domains i.e. “accurx.nhs.uk” that made our existing process a little too rigid.

Without total control of the DNS managing the domain, it can be difficult to fulfil the validation requirements of the ACME (Automated Certificate Management Environment) protocol, as you can’t update the DNS records yourself to prove ownership.

These are our requirements to validate a certificate for a domain we don’t manage:

  1. Make a certificate request for our domain *.accurx.nhs.uk via CA.
  2. Contact the NHS.uk DNS team with the ACME ‘TXT record’ validation values.
  3. Wait for the DNS folks to update their records.
  4. Resume the certificate request to validate, sign and merge the final chain cert for use.

If this sounds like a similar situation you’ve encountered in a project, there’s definitely a way forward!

Our use of certbot didn’t easily allow for this workflow, so I took a look around for some other tooling which would allow performing validation in 2 parts (1. and 4. ) in an async request model that would allow other steps to be carried out in-between.

The acme.sh project tool fits the bill, as it provides some very customisable functionality for making these types of requests. And with the command flag below, who wouldn’t want to use it 🙂.

acme.sh --signcsr --csr "$csr_path" --dns -d "$domain" \
--fullchain-file "$fullchain_path" --email $email \
--yes-I-know-dns-manual-mode-enough-go-ahead-please

This very descriptive flag, allowed the DNS validation to happen manually by our TXT record update script or… allowed us to go away and contact the NHS folks to make the same changes in their DNS.

Once the required DNS records and values were set, we could then run the below command from the same host to finalise the certificate signing:

acme.sh --renew -d "$domain" \
--fullchain-file "$fullchain_path" --email $email \
--yes-I-know-dns-manual-mode-enough-go-ahead-please

Using the certificates for our services: Application Gateway consumption

So we’d automated certificate creation in our Azure KeyVault — great! Now the question became:

How do we consume new versions of a certificate automatically after a renewal process is complete?

We use Azure Application Gateway with Kubernetes for our container services. With the Azure CLI each new certificate can be configured for consumption via Kubernetes Ingress rules. However KeyVault treats each renewed certificate as a new version, which you can see for yourself here:

> az keyvault certificate show --vault-name $vault_name \
--name $certificate_name | jq -r ‘.sid’
https://<vault_name>.vault.azure.net/secrets/<certificate_name>/9f4640ff459b49cfa025e8be0a8f2669

Where 9f4640ff459b49cfa025e8be0a8f2669 is the unique certificate version identifier.

Referencing the Certificate sid (KeyVault Secret ID) in this way wouldn’t hold up in the long run as we’d need to update our Application Gateway with the new certificate version each time our automation ran. So after some experimentation, we utilised the Certificates “root” SecretID path when we created the reference in Application Gateway.

az network application-gateway ssl-cert create \
-g "$gateway_resource_group" \
--gateway-name "$gateway_name" \
--name "new-certificate-name" \
--key-vault-secret-id "https://${vault_name}.vault.azure.net/secrets/${certificate_name}"

new_certificate_name is a unique identifier as far as Application Gateway is concerned and can be referenced in Kubernetes Ingress configuration without worrying about the KeyVault/secrets configuration.

Using the --key-vault-secret-id in this manner always uses the latest version of the Certificate from the vault and means no manual steps are required to consume our fresh certificates! 🤖 🎉

More information on referencing the Application Gateway certificate identifier in Ingress rules can be found here.

So after addressing these challenges, what did we actually end up with? Cue the drumroll…

🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁

What we ended up with: System design

The orange coloured numbers here are described in a little more details below with the 1️-6 emojis.

Invocation:

  • Azure Pipeline scheduled runs, generating 3 month ACME certs. With an increasing renewal frequency per environment to catch issues earlier in our development environment 1️⃣

Cert issue/renew/merge:

  • Bash “Renewal script” wrapper around acme.sh & Azure CLI commands 2️⃣
  • Azure DNS Zone TXT challenge updating script 2️⃣
  • ZeroSSL CA (default provided by acme.sh, but also fit our requirements for root certificate platform compatibility) 4️⃣

Alerts and notifications:

  • Azure pipelines email notification for renewal script failures 1️⃣
  • Uptime for user level service SSL certificate monitoring.

Azure Infrastructure for Certificate consumption:

Final Thoughts

The task went well given my limited experience with the networking/Azure infrastructure. As a software engineer, I hadn’t really needed to touch DNS configurations. And I didn’t have to worry about making sure our websites use TLS with valid SSL certificates. Because of that, our team took an iterative approach.

We researched, tested and integrated all the tools mentioned above. When it came to making the integration work, consolidating all our existing infrastructure into Azure made a huge difference and this is where my software background helped immensely. For me, it would’ve really helped to have a deeper understanding of DNSs at the start, so I would definitely recommend picking up this knowledge if you’re working on a similar task.

Thankfully, there are loads of open source projects that can be leveraged with cloud technologies, making this project interesting to tackle and transferable to similar tasks.

If you’re interested in joining Accurx as an engineer, take a look at our other engineering blog posts or visit our careers page for current roles. You can help connect people across healthcare.

--

--

Mitchell Homer
Accurx
Writer for

Software Engineer, currently doing DevOps at accuRx