Using Cert-Manager in OpenShift/OKD. Part 3—SSL troubleshooting , renewals, and updating the kubeconfig.
The article was initially published here: https://epam.github.io/edp-install/operator-guide/ssl-automation-okd/
Part 1 of this guide covers the cert-manager installation in OpenShift and integration with AWS Route53.
Part 2 consists of the cert-manager configuration and SSL certificate installation.
Part 3 describes certificates troubleshooting , renewals, and updating the kubeconfig.
Troubleshoot Certificates
Below is an example of the DNS TXT challenge
record created by the cert-manager operator:
Use nslookup
or dig
tools to check if the DNS propagation for the TXT record is complete:
nslookup -type=txt _acme-challenge.${DOMAIN}
dig txt _acme-challenge.${DOMAIN}
You can also use web tools like Google Admin Toolbox:
If the correct TXT value is shown (the value corresponds to the current TXT value in the DNS zone), it means that the DNS propagation is complete and Let’s Encrypt is able to access the record in order to validate it and issue a trusted certificate.
If the DNS validation challenge self check fails, cert-manager will retry the self check with a fixed 10-second retry interval. Challenges that do not ever complete the self check will continue retrying until the user intervenes by either retrying the
Order
(by deleting theOrder
resource) or amending the associatedCertificate
resource to resolve any configuration errors.
As soon as the domain ownership has been verified, any cert-manager affected validation TXT records in the AWS Route53 DNS zone will be cleaned up.
Please find below the issues that may occur and their troubleshooting
- When certificates are not issued for a long time, or a cert-manager resource is not in a Ready state, describing a resource may show the reason for the error.
- Basically, the creates the following resources during a
Certificate
issuance:CertificateRequest
,Order
, andChallenge
. Investigate each of them in case of errors. - Use the cmctl tool to show the state of a
Certificate
and its associated resources. - Check the cert-manager controller pod logs:
oc get pod -n openshift-operators | grep 'cert-manager'
oc logs -f cert-manager-${replica_set}-${random_string} -n openshift-operators
Certificate error debugging
- Decode certificate chain located in both secrets:
oc get secret router-certs -n openshift-ingress -o 'go-template={{index .data "tls.crt"}}' | base64 -d | while openssl x509 -noout -text; do :; done 2>/dev/null
oc get secret api-certs -n openshift-config -o 'go-template={{index .data "tls.crt"}}' | base64 -d | while openssl x509 -noout -text; do :; done 2>/dev/null
cmctl inspect secret router-certs -n openshift-ingress
cmctl inspect secret api-certs -n openshift-config
- Check the SSL RSA private key consistency
oc get secret router-certs -n openshift-ingress -o 'go-template={{index .data "tls.key"}}' | base64 -d | openssl rsa -check -noout
oc get secret api-certs -n openshift-config -o 'go-template={{index .data "tls.key"}}' | base64 -d | openssl rsa -check -noout
- Match the SSL certificate public key against its RSA private key. Their modulus must be identical:
diff <(oc get secret api-certs -n openshift-config -o 'go-template={{index .data "tls.crt"}}' | base64 -d | openssl x509 -noout -modulus | openssl md5) <(oc get secret api-certs -n openshift-config -o 'go-template={{index .data "tls.key"}}' | base64 -d | openssl rsa -noout -modulus | openssl md5)
diff <(oc get secret router-certs -n openshift-ingress -o 'go-template={{index .data "tls.crt"}}' | base64 -d | openssl x509 -noout -modulus | openssl md5) <(oc get secret router-certs -n openshift-ingress -o 'go-template={{index .data "tls.key"}}' | base64 -d | openssl rsa -noout -modulus | openssl md5)
Remove Obsolete Certificate Authority Data From Kubeconfig
After updating the certificates, the access to the cluster via Lens or CLI will be denied because of the untrusted certificate errors:
$ oc whoami
Unable to connect to the server: x509: certificate signed by unknown authority
Such behaviour appears because the oc
tool references an old CA data in the kubeconfig file.
You can examine the Certificate Authority data using the following command:
oc config view --minify --raw -o jsonpath='{.clusters[].cluster.certificate-authority-data}' | base64 -d | openssl x509 -text
This certificate has the
CA:TRUE
parameter, which means that this is a self-signed root CA certificate.
To fix the error, remove the old CA data from your OpenShift kubeconfig file:
sed -i "/certificate-authority-data/d" $KUBECONFIG
Since this field will be absent in the kubeconfig file, system root SSL certificate will be used to validate the cluster certificate trust chain. On Ubuntu, Let’s Encrypt OpenShift cluster certificates will be validated against Internet Security Research Group
root in /etc/ssl/certs/ca-certificates.crt
.
Certificate Renewals
The cert-manager automatically renews the certificates based on the X.509 certificate's duration and the renewBefore
value. The minimum value for the spec.duration
is 1 hour; for spec.renewBefore
, 5 minutes. It is also required that spec.duration
> spec.renewBefore
.
Use the cmctl tool to manually trigger a single instant certificate renewal:
cmctl renew router-certs -n openshift-ingress
cmctl renew api-certs -n openshift-config
Otherwise, manually force a renew of all certificates in all namespaces that contain the app=cert-manager
label in their spec
:
cmctl renew --all-namespaces -l app=cert-manager
Run the
cmctl renew --help
command to get more details.