Save money in your Kubernetes infrastructure while increasing its availability

When designing software systems, there is always a trade-off for each architectural decision you make. Traditionally, achieving HA (High Availability) in your services came at a great cost, which directly translates to a huge monetary investment. However, most cloud vendors offer an instance pricing model that fits variable workloads, like Amazon EC2 Spot Instances, Google Cloud Preemptible VMs or Azure Low-Priority VMs. That’s what we could call “Cheap Instances”.

What is a Cheap Instance?

At MindDoc, we use preemptible VMs that are automatically auto balanced by Google in our Kubernetes cluster based on CPU load and memory usage, which are 80% cheaper than a regular VM. Nonetheless, allowing flexibility in instance allocation is greatly rewarded by the main cloud providers, even if they have clear differences in their offerings.

Regardless of your cloud platform of choice, characteristics of a Cheap Instance are as follows:

  • You greatly save money, as you don’t require fixed capacity from their data center, allowing the vendor to better utilize their resources.
  • Can be terminated at any time, your workload should be fault-tolerant.
  • Can be autoscaled, the cloud provider automatically assigns a new VM in replacement of the terminated or additional/fewer instances based on your load requirements in that specific moment.

These huge discounts (80% fixed price reduction in Google Cloud and up to 90% in Amazon based on a bid mechanism) in comparison with a normal instance is definitely the feature that attracts more attention. However, it comes with a “catch”. Your VM will be stopped without notice at any time: it is like a monkey that randomly enters into your cluster and destroys it, a Chaos Monkey.

Photo by Tobias Mrzyk on Unsplash

The Chaos Monkey

In 2011, when DevOps was still in its infancy, Netflix released a very interesting tool that tested the resilience of IT infrastructure called Chaos Monkey, which randomly terminates production instances and containers.

That experimentation with production systems is the basics of chaos engineering, and the best motivation to build HA applications. Therefore, the biggest caveat about Cheap Instances can become a powerful tool to overcome unexpected infrastructure, network and application failures, without having to use the Chaos Monkey tool itself.

Moreover, software applications that are highly available even when the instances in which they run can abruptly shut down is a hard requirement. That is the actual “catch”. Thankfully, the Kubernetes ecosystem helps in fulfilling this requirement.

Load Balancing

As we have dynamic workloads and services restarting during unexpected times, the first problem that needs to be solved is load balancing. Kubernetes does an excellent job through the use of Ingress, Service and Pod abstractions.

At MindDoc, we use an Ingress to expose a HTTPS application to the Internet. The use of cert-manager automatically ensures the issuing of Let’s Encrypt certificates which will always be in front of the related application service.

Services are internal to our cluster (using ClusterIP), and are responsible for forwarding traffic to the defined application pods. The pods have meaningful health checks and their number can be defined either through a ReplicaSet or by specifying replicas in the Deployment.

The service will only forward traffic to healthy pods, any downtime will be unnoticeable as soon as there are other healthy pods that can receive traffic.

Therefore, only HTTPS traffic will be allowed from outside, with the Ingress doing TLS termination. The service abstracts a replica set, thus any other application that would like to interact with it doesn’t need to know the internal IP of those pods (replicas), as they will communicate through the service.

Pod Anti-Affinity

However, the cluster is composed of VMs (nodes), which host pods. What if all pod replicas are scheduled into the same node, and that node is shut down? The application will be down until its pods are running and healthy in a new node.

With no anti-affinity rules, pods like “mypod”, “foo” or “bar” can end up in the same node

Let us take an example with three replicas and three nodes. Ideally, we would assign one replica to each node. Therefore, if a node restarts, the other two will already be up and the service in front of the pods will transparently route the traffic to the healthy ones.

In order to achieve that, Kubernetes offers rules to assign pods to nodes, specifically anti-affinity. Anti-affinity means that a pod won’t be scheduled in a node if that node is already running one or more pods that meet a specific rule.

Normally, all nodes are already labeled with kubernetes.io/hostname. In the following example, the hard anti-affinity rule defines the hostname label as topologyKey in the deployment, thus each node will never have more than one pod that has the app=mypod-app label.

Enter Helm charts

Most third-party clusters have a Helm chart, which abstracts all the Kubernetes templates, generating them based on a set of input values (that can be given from the console, YAML files…). Helm charts can be understood as the Configuration Management for Kubernetes.

What is Helm? (https://helm.sh)

Many of them will expose replica and node anti-affinity configuration parameters, that will make easy the support of HA clusters. At MindDoc, we use charts that define multiple replicas and node anti-affinity for Elasticsearch, Fluent Bit, Kibana, Grafana, Prometheus, Nginx ingress, Buzzfeed SSO and Redis Sentinel.

Fully-managed PostgreSQL

Sometimes, putting the maintenance overhead on the cloud vendor could be the right choice. At MindDoc, we didn’t want to handle PostgreSQL HA, and we decided to use Google’s CloudSQL for Postgres. It is cost-effective, only reachable internally via VPC and guarantees a 99.95% SLA.

Wrapping Up

Achieving HA in the persistence layer (e.g. Redis and PostgreSQL) or other services can be a big headache, though in the worst case there are great fully-managed solutions. These could be expensive at some point, but might make up for the money saved with Cheap Instances and “developer time”.

A 99.95% uptime low-cost infrastructure is not easy to achieve, it will require a lot of technical expertise and knowledge about distributed architecture and orchestration systems, but as a result, there will be a reliable, scalable, cheap and elastic platform.