Tips for Designing a Kubernetes Cluster
Understanding the How’s and Why’s
by Chris Herrera
As Kubernetes use in the enterprise to deploy and manage applications continues to advance and managed service availability across all cloud platforms matures (AKS, GKS, EKS, OpenShift), I thought it would be a good time to share first hand experience that I’ve had leading and working on a software engineering team that is developing Tempus, a cloud-native IIoT application that relies on Kubernetes as part of the core infrastructure framework.
The initial installation and provisioning of Kubernetes can be daunting, but just creating a cluster is only Step 1. A lot of people overlook Step 2, namely, how to setup the cluster in terms of usage.
In this post I am going to go over what I have played with, mistakes I have made, and what I think are some decent guides to setting up Kubernetes and Helm in a cluster used by a few different teams.
The Use Case
You are a SysAdmin/SRE/Generally Awesome Person who is tasked with setting up a cluster for a bunch of 2 pizza teams who are developing ML/ETL/AI/Super Awesome Apps.
You want to enable the autonomy of these teams by build a cluster where no one is running into each other, and you want to be able to understand that it can scale, but without the worry of breaking the bank.
- Resource Quotas
We are going to assume here you are going with your preferred flavor of managed Kubernetes offerings. That could be GKE, AKS, OpenShift, or EKS. If you are going it on your own, again you could look at OpenShift or doing something with Kops, but there are other considerations around managing availability and making sure that etcd behaves itself. That discussion is out of the scope of this post.
Ok, why am I starting with labels. Labels and Selectors are great, they are loosely coupled way to define how you are going to organize services, define requirements etc… I am not going to rehash the k8s documentation (which is super fantastic) so I will just point you to it: Labels and Selectors.
Now this is all well and good, but what about the scenario when you have guys who are building apps that rely on different needs (IO/Memory/etc…). Let’s assume that you have a team working on an app that really benefits from high-speed I/O, I would label a node with diskspeed: high, and then use a nodeSelector in my pod spec that would pin that pod to that node.
Now, if you, the admin, don’t want to be on the hook for the call when someone put drivetype:ssd or diskspeed:reallysuperfast or whatever, and their pod could not be scheduled, it would be good to have a defined list of labels and selectors in place that your teams can find so when they are building their manifests. It also ensures that you don’t have to spec your nodes to the highest common denominator of all the workloads.
This is one example of a situation where having a defined set of key/value pairs can benefit you, however, there are many more. Having this published in a central place and maintained, means that your teams won’t drive you crazy.
What is RBAC — Role Based Access Control, again the documentation is: Here. You need to have this on. It’s important to avoid, not just bad actors taking over you system, but a well intentioned employee making a mistake. This will allow you to scope your users (or subjects in k8s speak) to a specific resource or set of resources. This is going to go hand-in-hand with the next section below (namespaces), just due to the simplicity of scoping at the ClusterRole level vs the Role.
Just a quick note: ClusterRole will allow you to grant subjects access to resources in the cluster — for example nodes, or all the pods in a namespace, while Role will only allow you to grant subjects access to resources within a namespace.
You will want to predefine a set of ClusterRoles. This will allow you to quickly grant access by creating/modifying the ClusterRoleBindings. This means that if you create them right off the bat, you won’t have to worry about dealing with one-off requests for access later.
As a high level I like to use this structure:
- Namespace Admin — Manages all the goings on in the namespace, Roles (note not cluster roles…because we are within a namespace), role bindings, deployments, etc.
- Namespace Deployment Manager — Allowed to
"get", "list", "watch", "create", "update", "patch", "delete"resources in that namespace
- Cluster Reader — Allowed to
"get", "list", "watch"resources in that namespace. This could be QA or testers, who just want to see and get the logs of a running pod.
This of course can be customized to suit the needs of the team, however, I am providing this as a starting point.
There is the option for ABAC as well (attribute based access control). This is something that I do not use that much just to the overall complexity of managing via attribute. Additionally RBAC is more prevalent in overall usage today.
In addition to the RBAC, ClusterRoleBinding discussion above, I wanted to touch on namespaces. Using namespaces allows you to ensure that you can grant the teams autonomy, without sacrificing overall cluster security. i.e. Avoiding the “Hey that guy deleted my deployment and deployed his deployment” situation.
Allow each team to have their own namespace, while maintaining separate namespaces for things like ingress or logging. This should be done during a one-time provisioning of the new project to be hosted on the cluster.
Namespaces also allow you to implement resource quotas. This is in addition to the issue above where someone removes someone else’s work, but you want to make sure that one workload does not affect anyone else’s…enter the Resource Quotas.
Resource quotas are great. Basically it is a resource that allows you to limit the total amount of compute that a namespace can take up. You can essentially limit the requests and usage of cpu, storage, gpu, objects in a namespace.
One note I want to make on these Quotas, however, is that this needs to be balanced with the next section: autoscaling groups. The reason is that resource quotas are defined as a number…a hard unit, meaning if I add another node to my cluster the quotas will remain the same.
So to be clear, cluster capacity and resource quotas are two separate elements. They need to be managed together.
So, there is a feature that, depending on your managed k8s provider, will be turned on or not: Autoscaling.
Autoscaling will allow your cluster to grow from a minimum node size to a maximum node size. This is wonderful from a cost management standpoint, especially in a cloud scenario, as you will not just want to have nodes hanging out there doing nothing, or an overall capacity of 10% across your nodes.
There are a couple of thoughts you want to put into this however.
- Availability zone needs to be taken into account for HA clusters. You want to make sure you have a minimum number of nodes across the required number of AZ’s for redundancy
- You need to balance this with Resource quotas. This means that you want to set it up so that you can leverage the extra compute when you need it and don’t restrict yourself to a lower cluster size.
Autoscaling, allows you to perform horizontal pod scaling, vertical pod scaling, and then node scaling to accommodate the additional compute. This is very useful, say if you have the majority of your teams in one time zone, thereby experiencing most of your access during 8 hours of the day and not so much at night.
There is not much I wanted to say about helm, other than it is wonderful. It is like a package manager (homebrew, apt, yum, etc.) for your k8s projects.
Having your teams create a helm chart and having an internal repo to pull from (or publicly if that’s your game) is a great way to define dependencies, configure your deployments between staging and prod, and make your deployments that much more repeatable.
I often suggest having a base helm chart that your teams can use as a skeleton to implement their charts in, something very simple that shows templating and configuration via values.yaml.
A Note on RBAC for Helm
In your RBAC enabled cluster you want to give tiller (the component that does the deployment of the chart on your cluster) the ability to do so.
There are a number of different configurations you can do this in, but I cannot do much better than the documentation: Here. You will want to scope tiller to the namespace that you are working (one tiller per team) and create a service account that allows tiller to deploy to that namespace.
This will allow the individual teams to have the ability to deploy their own charts without affecting anyone else.
I Hope You Now Understand the “Why”
I’ve seen a lot of posts on “how” to do many of these things, but none that really explain the rationale behind them so I hope this has helped whether you are using Kubernetes today or considering jumping in.
A couple of other recent Hashmap posts in and around this topic include Making Kubernetes Approachable — Our Experience with Kops and Rancher and The What, Why, and How of a Microservices Architecture.
Feel free to share on other channels and be sure and keep up with all new content from Hashmap on our Engineering and Technology Blog.
Chris Herrera is Chief Innovation Officer at Hashmap working across industries with a group of innovative technologists and domain experts accelerating high value business outcomes for our customers.
You can tweet Chris @cherrera2001 and connect with him on LinkedIn and also be sure to catch him on Hashmap’s Weekly IoT on Tap Podcast for a casual conversation about IoT from a developer’s perspective.