Testing Spot Reclamation Mechanisms with AWS Node Termination Handler and Kubernetes Autoscaler

Daniel Esponda
4 min readApr 23, 2023

--

Spot instances are becoming increasingly popular due to their significant cost savings on the cloud. However, maintaining the continuity of Kubernetes workloads when spot instances are reclaimed can be challenging. For instance, what happens if a spot instance is reclaimed, but there is no more capacity available in the cluster?

Fortunately, AWS provides a two-minute warning before terminating a reclaimed instance. This warning allows Kubernetes clusters to use the AWS Node Termination Handler in conjunction with Kubernetes Autoscaler to handle spot reclamations seamlessly, without worrying about service interruptions or insufficient capacity in your cluster.

The AWS Termination Handler ensures that workloads are cleanly drained by stopping new requests from being made to pods in the reclaimed instance through the Kubernetes cordon operation and by rescheduling those pods in other nodes via the Kubernetes drain operation. However, what happens if there isn’t enough capacity in the remaining nodes? This is where the Kubernetes Autoscaler comes in.

The Kubernetes Autoscaler detects that there isn’t enough capacity left in the remaining instances and requests AWS to spin up new instances. For this mechanism to work correctly, it is essential to ensure that your ASGs, Termination Handler, and Autoscaler are correctly configured. However, this article focuses on testing this mechanism.

AWS provides a Spot Instance with a two-minute warning by changing the response of the metadata API. When an instance is about to be reclaimed, the metadata API endpoint /spot/instance-action returns a 200 response with a JSON body that looks like:

{"action":"terminate","time":"2020-05-07T04:38:00.078Z"}

The AWS Termination Handler polls the metadata API for this event, which can be seen here: https://github.com/aws/aws-node-termination-handler/blob/v1.3.1/pkg/ec2metadata/ec2metadata.go#L135-L153

With this information, we can simulate this event by mocking the response of the metadata API and configuring the instance to hit our mock response.

As an efficient programmer, I found a service that already does this, which I can deploy to my Kubernetes cluster by applying the following manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: ec2-spot-termination-simulator
name: spot-term-simulator
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: ec2-spot-termination-simulator
template:
metadata:
labels:
app: ec2-spot-termination-simulator
spec:
containers:
- image: shoganator/ec2-spot-termination-simulator:1.0.1
imagePullPolicy: Always
name: spot-term-simulator
env:
- name: PORT
value: "80"
dnsPolicy: ClusterFirst
restartPolicy: Always
terminationGracePeriodSeconds: 1
---
apiVersion: v1
kind: Service
metadata:
labels:
app: spot-term-simulator
name: ec2-spot-termination-simulator
spec:
ports:
- name: http
port: 8082
nodePort: 30000
protocol: TCP
targetPort: 80
selector:
app: ec2-spot-termination-simulator
sessionAffinity: None
type: NodePort

This deploys the mock service, but I still need to override the metadata API. Before we do this, we need to figure out which node the service is running on and shell into that node.

In my cluster, I don’t have SSH access to the Kubernetes instances by design, but I can still gain access to the instance configuration through the aws-node pods. I open up a shell in one of these pods through the command:

kubectl exec -it aws-node-<id> bash

Next, we need to install two yum packages so that we can reroute the metadata API, net-tools, and socat:

yum install -y net-tools
yum install -y socat

We can then reroute the API traffic and test that our Node Termination Handler and Cluster Autoscaler work correctly by running the following commands:

ifconfig lo:0 169.254.169.254 up
socat TCP4-LISTEN:80,fork TCP4:127.0.0.1:30000

As soon as we run these commands, we can see that the cluster has started draining the pods. However, suppose I don’t have enough capacity in my cluster to immediately reschedule the pods, in that case, some pods will remain pending.

However, our cluster autoscaler notices this and immediately requests new nodes. Ninety seconds later, our node is ready to go, and we can schedule the pending pods.

With this mechanism tested and working correctly, I can now sleep in peace knowing that spot reclamations will not ruin my night.

--

--

Daniel Esponda
Daniel Esponda

Written by Daniel Esponda

Staff Site Reliability Engineer @ Datadog

No responses yet