Zero Downtime Node Patching in a Kubernetes Cluster

Vaishnavi Galgali
Salesforce Engineering
6 min readFeb 11, 2021

Authors: Vaishnavi Galgali, Arpeet Kale, Robert Xue

Introduction

The Salesforce Einstein Vision and Language services are deployed in an AWS Elastic Kubernetes Service (EKS) cluster. One of the primary security and compliance requirements is operating system patching. The cluster nodes that the services are deployed on need to have regular operating system updates. Operating system patching mitigates vulnerabilities that may expose the virtual machines to attacks.

Patching Process

Einstein services are deployed as Kubernetes pods on an immutable EC2 node group, also known as an AWS AutoScaling Group (ASG). The patching process involves building a new Amazon Machine Image (AMI) that contains all of the updated security patches. The new AMI is used to update the node group, which involves launching new EC2 instances, one at a time. As the new instance passes all the health checks, one of the old instances is terminated. This process continues until all of the EC2 instances in the node group are replaced. This is also known as a rolling update.

However, this patching process introduces a challenge. As the old EC2 instances are terminated, the service pod running on those EC2 instances is also terminated. This may lead to failures for any user requests being processed at the time of termination unless the termination of the pod is handled gracefully. Graceful termination of the pod involves infrastructure components (the Kubernetes API and AWS ASGs) and application components (service/app container).

Graceful Application Termination

In this process, the application is first gracefully terminated. Terminating a pod may result in abruptly terminating the Docker container in the pod. This implies that the process running in the Docker container is also abruptly terminated. This may cause any requests being processed to be terminated, eventually leading to failures for any upstream service calling the application at that time.

When an EC2 instance is being terminated as part of the patching process, the pods on that instance are evicted. This marks the pods for termination and the kubelet running on that EC2 instance commences the pod shutdown process. As part of the pod shutdown, kubelet issues a SIGTERM signal. If the application running in the pod isn’t configured to handle the SIGTERM signal, it may result in abruptly terminating any running tasks. Therefore, you want to update your application to handle this signal and gracefully shut down.

For example, in the case of a Java application, here’s one way to address graceful termination (this differs from framework to framework):

public static final int gracefulShutdownTimeoutSeconds = 30;@Override
public void onApplicationEvent(@NotNull ContextClosedEvent contextClosedEvent) {
this.connector.pause();
Executor executor = this.connector.getProtocolHandler().getExecutor();
if (executor instanceof ThreadPoolExecutor) {
try {
ThreadPoolExecutor threadPoolExecutor = (ThreadPoolExecutor) executor;
threadPoolExecutor.shutdown();
logger.warn("Gracefully shutdown the service.");
if (!threadPoolExecutor.awaitTermination(gracefulShutdownTimeoutSeconds, TimeUnit.SECONDS)) {
logger.warn("Forcefully shutdown the service after {} seconds.", gracefulShutdownTimeoutSeconds);
threadPoolExecutor.shutdownNow();
}
} catch (InterruptedException ex) {
Thread.currentThread().interrupt();
}
}
}

In the above snippet, the shutdown is initiated and, after 30 seconds, the application is forcefully terminated. This gives the application 30 seconds to process any running tasks.

If the pod consists of multiple containers, and the order of container termination matters, then define a container preStop hook to ensure that the containers are terminated in the correct sequence (for example, terminating an application container before terminating a logging sidecar container).

During the process of pod shutdown, the kubelet follows container lifecycle hooks, if defined. In our case, we have multiple containers in the same pod and so, for us, the order of termination matters. We define the preStop hook for our application containers as shown below:

lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- kill -SIGTERM 1 && while ps -p 1 > /dev/null; do sleep 1; done;

The action defined in the preStop hook above sends a SIGTERM signal to the process running in the Docker container (PID 1) and waits in intervals of 1 second until the process is successfully terminated. This allows the process to complete any pending tasks and terminate gracefully.

The default timeout of the preStop hook is 30 seconds, which. in our case, gives enough time for the process to terminate gracefully. If the default time isn’t sufficient, you can specify it by using the terminationGracePeriodSeconds field in the preStop hook.

Graceful EC2 Instance Termination

As mentioned above, our services run on node groups of EC2 instances. Graceful EC2 instance termination can be achieved by using AWS ASG lifecycle hooks and an AWS lambda service.

AWS EC2 Auto Scaling Lifecycle Hooks

Lifecycle hooks help in pausing the instance state and performing custom actions before launching the new instance or terminating the old instance. Once the instance is paused, you can complete the lifecycle action by triggering a Lambda function or running commands on the instance. The instance remains in the wait state until the lifecycle action is completed.

We use the Terminating:Wait lifecycle hook to put the instance to be terminated in the WAIT state. For more details on ASG lifecycle hooks, see the AWS docs.

AWS Lambda

We use SAM framework to deploy a Lambda function (built in-house; we call it node-drainer) that is triggered on specific ASG lifecycle hook events. The following diagram shows the sequence of events involved in gracefully terminating an EC2 instance in the node group.

  • When the Patching Automation requests the instance termination, the lifecycle webhook kicks in and puts the instance in the Terminating:Wait state.
  • Once the instance is in the Terminating:Wait state, the lifecycle webhook triggers the node-drainer AWS Lambda function.
  • The Lambda function calls Kubernetes APIs and cordons the terminating instance. Cordoning the instance prevents any new pods launching on the terminating instance.
  • Once the instance is cordoned, all the pods from that instance are evicted and placed on a healthy node.
  • Kubernetes takes care of bringing up new pods on healthy instances.
  • The lifecycle hook waits until all the pods are evicted from the instance and the new pods come up on a healthy instance.
  • Once the node is drained completely, the Lifecycle hook removes the WAIT on the node being terminated and continues with the termination.
  • This ensures that all the existing requests are completed, and then the pods are evicted from the node.
  • While doing this we ensure new healthy pods are up to service new traffic.
  • This graceful shutdown process helps us to ensure that no pods are abruptly shut down and there is no service disruption.

RBAC

To access the Kubernetes resources from the AWS Lambda function we create an IAM role, a clusterrole and a clusterrolebinding. The IAM role grants permission to access ASGs. The clusterrole and clusterrolebinding grant the node-drainer Lambda function permissions for Kubernetes Pod eviction.

IAM role policy

{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"autoscaling:CompleteLifecycleAction",
"ec2:DescribeInstances",
"eks:DescribeCluster",
"sts:GetCallerIdentity"
],
"Resource": "*",
"Effect": "Allow"
}
]
}

Clusterrole

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: lambda-cluster-access
rules:
- apiGroups: [""]
resources: ["pods", "pods/eviction", "nodes"]
verbs: ["create", "list", "patch"]

Clusterrolebinding

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: lambda-user-cluster-role-binding
subjects:
- kind: User
name: lambda
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: lambda-cluster-access
apiGroup: rbac.authorization.k8s.io

Conclusion

With the combination of AWS Lambda, AWS EC2 AutoScaling Lifecycle hooks, and graceful application process termination, we ensure zero downtime while replacing our EC2 instances frequently during patching.

Please reach out to us with any questions, or if there is something you’d be interested in discussing that we haven’t covered.

--

--