Using Cert-Manager in OpenShift/OKD. Part 3—SSL troubleshooting , renewals, and updating the kubeconfig.

Evgeniy P
4 min readJun 29, 2023

--

The article was initially published here: https://epam.github.io/edp-install/operator-guide/ssl-automation-okd/

Part 1 of this guide covers the cert-manager installation in OpenShift and integration with AWS Route53.

Part 2 consists of the cert-manager configuration and SSL certificate installation.

Part 3 describes certificates troubleshooting , renewals, and updating the kubeconfig.

Troubleshoot Certificates

Below is an example of the DNS TXT challenge record created by the cert-manager operator:

Use nslookup or dig tools to check if the DNS propagation for the TXT record is complete:

nslookup -type=txt _acme-challenge.${DOMAIN}
dig txt _acme-challenge.${DOMAIN}

You can also use web tools like Google Admin Toolbox:

If the correct TXT value is shown (the value corresponds to the current TXT value in the DNS zone), it means that the DNS propagation is complete and Let’s Encrypt is able to access the record in order to validate it and issue a trusted certificate.

If the DNS validation challenge self check fails, cert-manager will retry the self check with a fixed 10-second retry interval. Challenges that do not ever complete the self check will continue retrying until the user intervenes by either retrying the Order (by deleting the Order resource) or amending the associated Certificate resource to resolve any configuration errors.

As soon as the domain ownership has been verified, any cert-manager affected validation TXT records in the AWS Route53 DNS zone will be cleaned up.

Please find below the issues that may occur and their troubleshooting

  • When certificates are not issued for a long time, or a cert-manager resource is not in a Ready state, describing a resource may show the reason for the error.
  • Basically, the creates the following resources during a Certificate issuance: CertificateRequest, Order, and Challenge. Investigate each of them in case of errors.
  • Use the cmctl tool to show the state of a Certificate and its associated resources.
  • Check the cert-manager controller pod logs:
oc get pod -n openshift-operators | grep 'cert-manager'
oc logs -f cert-manager-${replica_set}-${random_string} -n openshift-operators

Certificate error debugging

  • Decode certificate chain located in both secrets:
oc get secret router-certs -n openshift-ingress -o 'go-template={{index .data "tls.crt"}}' | base64 -d | while openssl x509 -noout -text; do :; done 2>/dev/null
oc get secret api-certs -n openshift-config -o 'go-template={{index .data "tls.crt"}}' | base64 -d | while openssl x509 -noout -text; do :; done 2>/dev/null
cmctl inspect secret router-certs -n openshift-ingress
cmctl inspect secret api-certs -n openshift-config
  • Check the SSL RSA private key consistency
oc get secret router-certs -n openshift-ingress -o 'go-template={{index .data "tls.key"}}' | base64 -d | openssl rsa -check -noout
oc get secret api-certs -n openshift-config -o 'go-template={{index .data "tls.key"}}' | base64 -d | openssl rsa -check -noout
  • Match the SSL certificate public key against its RSA private key. Their modulus must be identical:
diff <(oc get secret api-certs -n openshift-config -o 'go-template={{index .data "tls.crt"}}' | base64 -d | openssl x509 -noout -modulus | openssl md5) <(oc get secret api-certs -n openshift-config -o 'go-template={{index .data "tls.key"}}' | base64 -d | openssl rsa -noout -modulus | openssl md5)
diff <(oc get secret router-certs -n openshift-ingress -o 'go-template={{index .data "tls.crt"}}' | base64 -d | openssl x509 -noout -modulus | openssl md5) <(oc get secret router-certs -n openshift-ingress -o 'go-template={{index .data "tls.key"}}' | base64 -d | openssl rsa -noout -modulus | openssl md5)

Remove Obsolete Certificate Authority Data From Kubeconfig

After updating the certificates, the access to the cluster via Lens or CLI will be denied because of the untrusted certificate errors:

$ oc whoami
Unable to connect to the server: x509: certificate signed by unknown authority

Such behaviour appears because the oc tool references an old CA data in the kubeconfig file.

You can examine the Certificate Authority data using the following command:

oc config view --minify --raw -o jsonpath='{.clusters[].cluster.certificate-authority-data}' | base64 -d | openssl x509 -text

This certificate has the CA:TRUE parameter, which means that this is a self-signed root CA certificate.

To fix the error, remove the old CA data from your OpenShift kubeconfig file:

sed -i "/certificate-authority-data/d" $KUBECONFIG

Since this field will be absent in the kubeconfig file, system root SSL certificate will be used to validate the cluster certificate trust chain. On Ubuntu, Let’s Encrypt OpenShift cluster certificates will be validated against Internet Security Research Group root in /etc/ssl/certs/ca-certificates.crt.

Certificate Renewals

The cert-manager automatically renews the certificates based on the X.509 certificate's duration and the renewBefore value. The minimum value for the spec.duration is 1 hour; for spec.renewBefore, 5 minutes. It is also required that spec.duration > spec.renewBefore.

Use the cmctl tool to manually trigger a single instant certificate renewal:

cmctl renew router-certs -n openshift-ingress
cmctl renew api-certs -n openshift-config

Otherwise, manually force a renew of all certificates in all namespaces that contain the app=cert-manager label in their spec:

cmctl renew --all-namespaces -l app=cert-manager

Run the cmctl renew --help command to get more details.

--

--