Instance-Manager: A Kubernetes-Native Worker Node Manager

Eytan Avisror
keikoproj

--

In Kubernetes today, the concept of managing a node’s lifecycle is quite different than other primitives such as pods, services, etc.

A cluster operator cannot simply submit a YAML file and request the creation or deletion of a node object. Nodes must be externally created or deleted and joined into or removed from a cluster. This ‘bootstrapping’ of a node is what creates the node object. In addition, the concept of a singular node never really makes sense when considering production workloads. Nodes are almost always provisioned in identical sets, whether they serve multiple or dedicated tenants. — This is to address load balancing and high availability requirements.

$ kubectl create -f ./my-node.yaml
error: nope.

The creation of nodes often seems to go against the Kubernetes model of declaratively converging objects into a state that is declared in YAML.

While there are a variety of existing tools such as Terraform, Managed Node Groups, and Fargate, to manage Kubernetes nodes, none of them are Kubernetes-native. These external tools and custom scripts require their own authentication, which eventually leads to more code to maintain and operate.

Managing nodes becomes a constant struggle when one has THOUSANDS of nodes to manage, with constant upgrades and updates.

“Hey, can you roll out this new AMI real quick?”

At Intuit, we are building one of the largest, fully-managed, multi-tenant Kubernetes platforms in order to reduce friction between Dev, Sec, Ops and development teams. Today, we manage around 220 clusters, and 9000 namespaces across all environments, where each namespace is a tenant service on our platform. — One of the biggest questions we had in getting started was around how can we easily manage so many nodes? In particular, how can we provide as much self-service and control as possible to our platform’s tenants, so that they can provision and manage their own instance-groups without bugging a human.

Instance Groups

KOPS came up with the concept of ‘instance-group’, which can be expressed using YAML. However, this is still not Kubernetes-native because there is no Custom Resource Definition and Controller that can handle an instance-group. About two years ago we decided to start the journey of designing and developing our own solution that can support the use cases we had of running a large-scale managed platform for Intuit. We open-sourced our project so that others faced with similar problems can benefit from our solution.

Instance-Manager defines a CRD to represent collections of nodes as instance groups. A controller reconciles instance group objects into the Kubernetes infrastructure and bootstraps the nodes to the cluster. The end result is that one can now submit YAML files to provision and manage worker nodes.

The basic instance group can look like this:

An instance-group resource

The above YAML defines an EC2 scaling group and its configuration. When a user submits this YAML file, Instance-Manager will create and bootstrap this node group to the cluster.

$ kubectl get instancegroups
NAMESPACE NAME STATE MIN MAX AGE
my-namespace my-instance-group Ready 3 6 10m
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
node-1 Ready my-instance-group 8m v1.15.11-eks-af3caf
node-2 Ready my-instance-group 8m v1.15.11-eks-af3caf
node-3 Ready my-instance-group 8m v1.15.11-eks-af3caf

The controller’s main job is to create the required infrastructure (scaling group, roles, etc.) and then bootstrap the group to the cluster by upserting the role into the ‘aws-auth’ configmap.

The API: Provisioner, Strategy, and Configuration

The instance-group API has three main sections:

  • Provisioner — This defines the type of provisioner that creates the instance group. In the above example we used the ‘eks’ provisioner, which creates a standard scaling group with EC2 instances which are bootstrapped to the EKS cluster. There are other provisioners such as the ‘fargate’ provisioner and ‘eks-managed’ (for managed node groups) with their own respective configurations.
  • Strategy — This defines the upgrade/rotation strategy for your node group. If you make a change that requires rotation e.g. changing the AMI, the strategy defines how this rotation will happen, most basic strategy being the ‘rollingUpdate’ strategy which simply terminates one instance at a time according to maxUnavailable you define. There are more complex strategies such as the ‘crd’ strategy which lets you submit any custom resource to handle this logic. We actually use this along with upgrade-manager to manage our upgrades.
  • Configuration — This defines the specific configuration for the provisioner you selected, everything ranging from subnets, key-pair, security groups, to the min/max size of the group, userData, volumes, and other supported configurations.

For a cluster operator/admin, the tasks of managing nodes and node groups are now very easy. Simply submit a YAML and wait for the nodes to join. When you want to modify something like the instanceType or AMI, simply modify the YAML and the controller takes care of reconciling the cluster to the new state.

But how can we give this control to a cluster tenant? We can’t expect a cluster tenant to know anything about subnets, key-pairs, etc. We do however expect the tenant to know the instanceType, size, and other sizing or application specific attributes.

Configuration Boundaries

To differentiate the use cases between cluster admin and namespace tenants, we added a feature that allows the cluster operator/admin to control which resource fields are ‘restricted’ and which fields are ‘shared’. For example, we would like to allow a tenant to submit a short version of the custom resource, one that cannot modify or control things such as the AMI, Subnets, UserData, and other fields we consider as ‘restricted’. However we do want to allow the tenant to control the instanceType, min/max, labels, volumes, etc. so they can customize to their requirements — again — without bugging a human.

This feature can be used by creating a config map which defines boundaries and provides the respective default values.

The instance-manager configmap

This allows the user to submit a much simpler custom resource, in order to provision worker nodes.

A shortened resource

The resulting resource computed by the controller will be a combination of the default values of the ‘restricted’ fields defined in the configmap, along with the merging of the default values of the ‘shared’ fields and the custom resource values if any. However from the tenant perspective, the custom resource is short and only defines what the tenant cares about.

In the above example, once an admin wants to change a default for the entire cluster (for example, new AMI), he/she can just modify the configmap default value — this will trigger a reconcile & upgrade for all of the cluster instance-groups.

With the introduction of InstanceGroups as CRD, our next step is to use GitOps to manage node level changes. If you are interested in learning more about GitOps and how it is used in the industry, read about our sister project ArgoCD.

In summary, Instance-Manager delivers incredibly convenient and powerful capabilities by turning groups of nodes into Kubernetes-native objects that can be managed like any other Kubernetes object. Check out the git repo and take it for a spin, or even better — join us by becoming a contributor!

Project
https://github.com/keikoproj/instance-manager

API Reference
https://github.com/keikoproj/instance-manager/blob/master/docs/EKS.md

--

--