Establishing trust between a set of applications? Use a PKI instead of self-signed certificates
Consider you are working in a delivery project where you have a set of hosts talking to each other over TLS and you need to establish trust between them? The customer wants TLS/HTTPS all over, for vertical as well as horizontal traffic!
The documentation reveals, “if you want to enable TLS on this port, here are the instructions to generate a self-signed certificate. Remember to import this certificate into relevant trust stores”
Things are very easy if you have two computers talking to each other. But when it comes to deployments where there is high-availability, load balancers and a large number of applications talking to each other, Admins or Solution Architects often find themselves frustrated with the sheer cognitive overload of where to import what certificates. This is especially the case, when self-signed certificates are used throughout the deployment, which is generally the case when it comes to applications deployed to internal networks.
This post will conceptually explain how product architects can make things simple for deployment personnel (e.g. admins, solution engineers). We will look at the implementation aspects in a later post.
Let us start with an illustration of an example deployment:
The client talks to the load balancer (which has a virtual IP shared amongst multiple instances), which talks to either instance of A, B, C. If A wants to talk to B, it talks to the load balancer which then talks to one of the instances of B. Instances of the same service talk to each other (e.g. to share state). All services talk to the external DB. I know the above (monolith based) architecture has flaws, but just for the sake of an example, let’s consider it.
In typical cases, external TLS connections are terminated at the load balancer while the internal communication is all plain-text. But this is changing. Security teams these days enforce TLS connections throughout.
In our example, all computers (A, B, C, LB, DB) have self signed certificates and the requirement is to have mutual-TLS enabled for all communication.
The problem — what to trust? where? — too many certificates to trust!
To establish trust between two computers talking via (mutual) TLS is to trust each other’s X.509 certificates (for the regular TLS, only the client trusts the server’s certificate). A trust is established — if a certificate itself is trusted or one of the certificates in the certificate chain is trusted (which is the root certificate generally).
So in our example, if the Load Balancer wants to talk to A (instance 1) using TLS, then you need the load balancer to trust A (instance 1)’s certificate and A (instance 1) to trust load balancer’s certificate.
The cognitive load that I am talking about comes when you use self-signed certificates everywhere. This is because a self-signed certificate does not have a certificate chain as it itself is the root certificate. This means, that in order to establish trust, you need to trust each self-signed certificate. And all applications will generally have their own self-signed certificates. This leads to too many self-signed certificates to trust.
In contrast to a PKI (Public Key Infrastructure), where there is a chain of certificates and trusting the root certificate means you trust the end-entity certificate.
In the above example (considering all certificates are self-signed), to enable mutual-TLS connections, followings needs to be done:
- Load Balancer trusts the certificates of A (instance 1), A (instance 2), B (instance 1), B (instance 2), C (instance 1), C (instance 2).
- A (instance 1) trusts the certificates of Load Balancer, A (instance 2), and external DB.
- A (instance 2) trusts the certificates of Load Balancer, A (instance 1), and external DB.
- B (instance 1) trusts the certificates of Load Balancer, B (instance 2), and external DB.
- B (instance 2) trusts the certificates of Load Balancer, B (instance 1), and external DB.
- C (instance 1) trusts the certificates of Load Balancer, C (instance 2), and external DB.
- C (instance 2) trusts the certificates of Load Balancer, C (instance 1), and external DB.
- External DB trusts the certificates of A (instance 1), A (instance 2), B (instance 1), B (instance 2), C (instance 1), C (instance 2).
Total number of certificates to import = 8!
Total number import actions = 30!
Consider the cognitive load of importing which certificate in what trust store, especially when there are a large number of applications (in this example we had only 3), and multiple trust stores (yes some application may have multiple trust stores).
This problem typically occurs because isolated development teams find it easier to quickly create self-signed certificates and get things up and running.
The solution is a small scale PKI!
It is as easy to set up a small scale PKI as it is to create self-signed certificates. The benefit — simplicity when dealing with establishing trust. This is how well known products such as Kubernetes, Docker Swarm establish trust between several computers in a cluster.
You don’t need to make a 3 tier PKI. A 2-tier can be enough.
So this is how things can be better in our example:
- create a root CA. You will now have a root CA certificate.
- sign all the certificates for A (instance 1), A (instance 2), B (instance 1), B (instance 2), C (instance 1), C (instance 2), and the Load Balancer with the root CA.
Now you only need to import the root CA certificate in the trust store of all the computers in the cluster — that is the Load Balancer and all instances of A, B, and C. This will enable trust between Load balancer and all instances of A, B, C. This will also enable trust between instances of same application, that is between A instances, B instances, and so on.
You still need to import the certificate of external DB into A, B, C instances trust stores, because we are considering it external to the cluster (unless you sign external DB certificate with your root CA).
Also, you only need to import the root CA certificate in the external DB trust store instead of each instance’s certificate to establish trust.
As a result,
Total number certificates to import = 2! (DB, and LB)
Total number of import actions = 8 (for root CA cert) + 6 (for DB cert) = 14!
Why do you only need to import the root CA certificate?
We are still creating separate certificates for each computer in the set, but each certificate is signed by the root CA. Remember the definition of establishing trust — you trust a certificate if a certificate in the chain is trusted. So if you trust the root certificate in the chain, all the other certificates down the chain are trusted (of course this is a very layman definition — there are other validations done too).
But the idea is simple, all communication will be trusted as long as the certificates are signed by the root CA! This means that the root CA certificate is the trust anchor for your cluster. The root CA establishes the trust between all the computers in the cluster.
I will create a separate post for the openssl commands for setting up a small CA, signing the certificates, and other practical details.
All of this can be just automated via a few shell scripts or even Ansible playbooks. Usable security equals better security as it leads to easier configuration and a lesser cognitive load. Easier and faster to renew certificates too!