The Ultimate Guide to Deploying Kubernetes Cluster on AWS EC2 Spot Instances Using Kops and EKS

Paul Zhao
Paul Zhao Projects
Published in
29 min readJun 18, 2020

A step by step walkthrough of deploying a highly available, reliable and resilient Kubernetes cluster leveraging AWS EC2 spot instances as worker nodes using both Kops and EKS.

Before we even get started, let’s answer few questions in regards to the philosophy behind our project.

Why adopting AWS EC2 Spot Instances?

Ever since AWS first introduced spot instances DevOps teams and CTOs have been asking themselves: is it possible to run my workloads on spot instances and still retain an acceptable level of performance and reliability/availability? To answer this question, let’s dive into instances provided by AWS and get hang of it.

AWS On-Demand Instances vs. Reserved Instances vs. Spot Instances

On-Demand Instances:

With on-demand, you pretty much can get a server at any time and there’s no commitment from you. At times of extremely high demand, you might not be able to get a server. It rarely happens though. For example, it can happen during AZ outage when customers might start flood requests to an AZ because their current AZ is down. Even in these rare times when an AZ is down, I’ve been able to get instances though. That’s how rare it is not to get an on-demand instance. Once you get the on-demand instance you keep it until you terminate it. In this pricing model, you pay the most because you can leave at any time.

Reserved Instances:

With reserved instances, you get the same instance hardware, but you pay less. You make some form of commitment at the beginning, and then you get to pay for the instance at a discounted rate. I’ll use made-up round numbers as an example to explain the concept:

  • On-demand rate — $100/mo.
  • Up-front payment — $500/mo
  • Discount reserved rate — $10/mo.

Spot Instances:

With spot instances, the commitment concept gets reversed. Instead the of lack of a commitment benefiting you, it applies the other way to AWS. With on-demand, you do not commit to AWS. With spot, AWS does not commit to you. With this kicker model, the pricing becomes extremely attractive. We’re talking about a 50% to 90% savings. That’s not just me being overly optimistic. I’ve seen these savings realized repeatedly.

  • On-Demand price: $1/hr
  • The market spot price: $0.2/hr
  • Your bid price: $0.5/hr
  • What you pay: $0.2/hr

There we go, if you do a simple calculation, it’s not difficult to find the reason behind using Spot Instances over On-Demand and Reserved Instances in terms of financial cost as long as Spot Instances serve your needs and requirements.

Here we’ve got one last question to answer prior to making the conclusion that spot instances are the right choice. The question is how we may guarantee performance and reliability/availability?

On its own, Spot Instances are unable to deliver it. However, combined with another AWS service, which is auto-scaling, we can simply resolve this issue. Here is the trick: AutoScaling on-demand fleet that runs alongside the spot fleet and is switched over to as needed.

Here let’s see how we can achieve it:

Auto scaling groups are sets of AWS instances that are managed as a group. These groups can comprise both on-demand and spot instances as well as a mix of both.

Auto scaling groups can both scale in (terminate) as well as scale out (launch) instances based on scaling policies. To take advantage of on-demand and spot billing models, auto scaling groups need to be configured to use a launch template.

The latest iteration of AWS auto scaling groups, also supports the deployment of multiple instance types as part of the same group. These Autoscaling groups can also be configured to span across availability zones.

All of this functionality helps improve the availability and resilience of our Kubernetes cluster. Having instance groups with multiple instance types and purchase options, deployed across availability zones, increases the number of capacity pools that our cluster has access to and results in fewer disruptions.

The goto tool for scaling Kubernetes clusters is the cluster Autoscaler. The Autoscaler is not part of core Kubernetes, but has seen widespread adoption in the community. Let’s review the cluster Autoscaler and the mechanics of its integration with auto scaling groups.

Why deploying Kubernetes cluster on AWS EC2 Spot Instances?

After answering the question of AWS EC2 Spot Instances, we will look into the methodology of choosing Kubernetes cluster.

Unlike other orchestration tools, Kubernetes is a containerized tool, which gives it the edge on flexibility that fits organizations of all shapes and sizes.

Now, we may jump into our conclusion that Deploying Kubernetes Cluster on AWS EC2 Spot Instances is the way to go!

With this infrastructure, we are able to deploy Kubernetes Cluster on AWS EC2 Spot Instances in order to guarantee reliability, robustness and availability of your clusters. Apart from that, Kubernetes, on its own, provides tons of flexibility for businesses of all shapes and sizes.

Speaking about Autoscaler, let’s further discuss it:

Cluster Autoscaler is a tool that scales (both in and out) the number of nodes in a Kubernetes cluster based on the scheduling status of pods and the utilization of individual nodes. On AWS, the cluster Autoscaler adds new instances to a cluster, whenever it detects pending pods that failed to schedule. It will also decrease the capacity of the cluster if it detects under-utilized instances, by removing those instances from the cluster pool.

The cluster Autoscaler makes scaling decisions based on a template node. The template node is the first node of the instance group, that the cluster Autoscaler detects and is assumed to be representative of all the nodes in the cluster. Whenever the cluster Autoscaler needs to make a scaling decision it does so based on the capacity of the template node.

Now let’s do it, utilizing the Cluster Autoscaler with Auto Scaling Groups

Since the cluster Autoscaler makes scaling decisions based on a template instance, it works best with auto scaling groups that have the same instance type. Scaling might not work properly with mixed Autoscaling groups that have multiple instance types.

The official workaround for this is to use instance types that have the same CPU and memory resources. For example both the t2.medium and c5.large EC2 instances have 2 CPUs and 4 GB of RAM. Both these instances can be used as part of the same Autoscaling group with the cluster Autoscaler.

We will use both approaches in this guide: multiple instance groups each having its own instance type and a single mixed instance group with multiple instance types. Each instance group will leverage spot instances.

We will also support our Kubernetes cluster with on-demand instances, which can take up the slack in the event of any interruptions to spot instances. This will further improve availability and reliability.

Now we’ll get started and get our hands dirty!

Kops: Deploying Kubernetes Cluster Leveraging EC2 Spot Instances

In this project, we’ll feature two mini projects:

  1. Deploy a Kubernetes cluster with multiple instance groups — each with its own instance type
  2. Deploy a Kubernetes cluster with a single mixed instance group with multiple instance types

For both mini projects above, we’ll deploy the cluster Autoscaler

Kops: Deploying Kubernetes Cluster with Multiple Spot and On-demand Instance Groups

In total, we will provision 3 instance groups; 2 instance groups will leverage spot instances and the remaining 1 will leverage on-demand instances exclusively.

Before we work on it, let’s install AWS CLI, kubectl as well as kops.

To install AWS CLI, please follow the instructions here.

There are two options:

  1. Install with pkg
  2. Install with command line

Depending on your OS, please choose accordingly

Please verify your AWS CLI

$ which aws
/usr/local/bin/aws
$ aws --version
aws-cli/2.0.22 Python/3.7.4 Darwin/19.5.0 botocore/2.0.0dev26

To install kubectl, please follow instructions here.

I would highly recommend using Homebrew if you are using a mac since it provides installation with a great numbers of tools with ease.

Installing Homebrew

To install Linuxbrew on your Linux distribution, first you need to install following dependencies as shown. (This Linuxbrew installation applies to MacOS as well, you may make adjustment in regards to name)

--------- On Debian/Ubuntu --------- 
$ sudo apt-get install build-essential curl file git--------- On Fedora 22+ ---------
$ sudo dnf groupinstall 'Development Tools' && sudo dnf install curl file git--------- On CentOS/RHEL ---------
$ sudo yum groupinstall 'Development Tools' && sudo yum install curl file git

Once the dependencies installed, you can use the following script to install Linuxbrew package in /home/linuxbrew/.linuxbrew (or in your home directory at ~/.linuxbrew) as shown.

$ sh -c "$(curl -fsSL https://raw.githubusercontent.com/Linuxbrew/install/master/install.sh)"

Next, you need to add the directories /home/linuxbrew/.linuxbrew/bin (or ~/.linuxbrew/bin) and /home/linuxbrew/.linuxbrew/sbin (or ~/.linuxbrew/sbin) to your PATH and to your bash shell initialization script ~/.bashrc as shown.

$ echo 'export PATH="/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin/:$PATH"' >>~/.bashrc
$ echo 'export MANPATH="/home/linuxbrew/.linuxbrew/share/man:$MANPATH"' >>~/.bashrc
$ echo 'export INFOPATH="/home/linuxbrew/.linuxbrew/share/info:$INFOPATH"' >>~/.bashrc

Then source the ~/.bashrc file for the recent changes to take effect.

$ source  ~/.bashrc

Check the version to confirm if it is installed correctly.

$ brew --version
$ Homebrew 2.2.16
$ Homebrew/homebrew-core (git revision a59d5e; last commit 2020-05-13)
$ Homebrew/homebrew-cask (git revision d25f8; last commit 2020-05-13)

After Homebrew is installed, you can simply install and verify kubectl with following command lines:

$ brew install kubectl

$ kubectl version --client
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-21T14:50:54Z", GoVersion:"go1.14.3", Compiler:"gc", Platform:"darwin/amd64"}

To install kops, please follow instructions here.

Prerequisites:

Let’s install kops first. Then we’ll focus on how to create an AWS account, generate IAM keys and configure them.

Instructions for kops installation are found here.

Here I would still highly recommend Homebrew for installation if you’re using a mac

Install and verify your kops installation when using Homebrew:

$ brew update && brew install kops
$ which kops
/usr/local/bin/kops
$ kops version
Version 1.17.0 (git-a17511e6dd)

Now let’s focus on AWS account, generate IAM keys as well as configure them.

Creating a non-root user

Based on AWS best practice, root user is not recommended to perform everyday tasks, even the administrative ones. The root user, rather is used to to create your first IAM user, groups and roles. Then you need to securely lock away the root user credentials and use them to perform only a few account and service management tasks.

Notes: If you would like to learn more about why we should not use root user for operations and more about AWS account, please find more here.

Login as a Root user
Create a user under IAM service
Choose programmatic access
Attach required policies
Create user without tags
Keep credentials (Access key ID and Secret access key)

Create a kubernetes cluster using Kops

Firstly, we have to generate public/private rsa key pair in order to get started

$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/paulzhao/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /Users/paulzhao/.ssh/id_rsa.
Your public key has been saved in /Users/paulzhao/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:TOtHlr7Uv8KIB6Es444zixl9wtT3fCCZdIO+NqVPP6A paulzhao@localhost.local
The key's randomart image is:
+---[RSA 3072]----+
| |
| . |
| o + |
| . o *.o . |
| . ..*.S.+ |
| + o.oOo= . |
|. +..o=.=+=o. |
| +o+..E+.=+.o. |
|o o=. ..o. .o. |
+----[SHA256]-----+

Secondly, we need to create a S3 bucket

Notes: This name of S3 bucket must be globally unique. So feel free to create your AWS console as long as it is in use.

Search S3 under service drop-down

Search S3

Create bucket under Amazon S3 service

Create S3 bucket

Create bucket page

Create bucket page

Configure options

Configure options

Set permissions

Set permissions

Review page

Review page

Provide AWS credentials using AWS CLI:

$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]: json

Notes: AWS Access Key and AWS Secret Access Key both are created previously when creating AWS programmatic credentials

To interact minikube with your cluster using kubectl, we need to apply following command lines

$ minikube start
😄 minikube v1.11.0 on Darwin 10.15.5
✨ Using the hyperkit driver based on existing profile
👍 Starting control plane node minikube in cluster minikube
🔄 Restarting existing hyperkit VM for "minikube" ...
🐳 Preparing Kubernetes v1.18.3 on Docker 19.03.8 ...
🔎 Verifying Kubernetes components...
🌟 Enabled addons: default-storageclass, storage-provisioner
🏄 Done! kubectl is now configured to use "minikube"

As we set up every aspect of tools, we will move on deployments

Kops: Deploying Kubernetes Cluster Leveraging EC2 Spot Instances

n this section we will walkthrough the process of deploying a Kubernetes cluster leveraging EC2 spot instances using Kops. We will deploy a Kubernetes cluster with multiple instance groups — each with its own instance type — as well as a cluster with a single mixed instance group with multiple instance types. We will also deploy the cluster autoscaler for both clusters.

Let’s start by provisioning a cluster with multiple instance groups using Kops.

Kops: Deploying Kubernetes Cluster with Multiple Spot and On-demand Instance Groups

Create a kubernetes cluster using Kops

Here is how you generate public key

$ ssh-key

Then proceed with default options before generating public key

kops create cluster \ ## To create cluster
--name demo.cloudchap.cf \ ## Provide the name of cluster
--state s3://hash-kops-kube-bucket \ ## Provide AWS S3 bucket name (It should be globally unique)
--cloud aws \ ## Use AWS platform
--master-size t2.medium \ ## Configure master node size
--master-count 1 \ ## Configure number of master
--master-zones eu-west-1a \ ## Configure master zones
--node-size t2.medium \ ## Configure working node size
--node-count 1 \ ## Configure number of working node
--zones eu-west-1a,eu-west-1b,eu-west-1c \ ## Configure working nodes zones
--ssh-public-key ~/.ssh/id_rsa.pub \ ## This needs to be generated

Review the cluster and the AWS resources that will be created

$ kops update cluster demo.cloudchap.cf

Create the cluster

$ kops update cluster demo.cloudchap.cf --yes

This will create the Kubernetes cluster and will also create two instance groups: one each for the master node and the worker nodes

Verify that the instance groups have been created

$ kops get ig --name demo.cloudchap.cf
Using cluster from kubectl context: devopspaulzhao.com

NAME ROLE MACHINETYPE MIN MAX ZONES
master-eu-west-1a Master t2.medium 1 1 eu-west-1a
nodes Node t2.medium 1 7 eu-west-1a,eu-west-1b,eu-west-1c

We can also see the corresponding Autoscaling groups in the AWS console.

Master and nodes instances

Next we will create the two spot instance groups. Each instance group will leverage a separate EC2 spot instance type.

As mentioned before the new spot instance billing model, no longer requires us to submit bids. AWS users simply pay the spot price for the instance that is in effect for that hour.

However, we do have the option of setting an optional maximum amount that we are willing to pay for the spot instance. The default maximum price is the on-demand price for that instance.

The spot instance pricing history (in the EC2 console under Spot requests) gives us access to the current and historical pricing for a spot instance. Here is the pricing history for the t3.micro spot instance.

The spot pricing for t3.micro instance is relatively stable at $0.0114 for the last 3 months. To leave some headroom for price increase, let’s set the max price we are willing to pay as $0.0120.

We will also increase maxSize and minSize to 7 and 1 respectively. These are the upper and lower limits on the number of instances that are allowed to run in the instance group.

kops create ig spot-ig

Change machine type to t3.micro

Add the following under spec

maxPrice: "0.0120"
maxSize: 7
minSize: 1

As well as the following nodeLabel

nodeLabels:
on-demand: "false"

Here is what the configuration looks like

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2019-09-13T13:11:19Z
labels:
kops.k8s.io/cluster: demo.cloudchap.cf
name: spot-ig
spec:
image: kope.io/k8s-1.13-debian-stretch-amd64-hvm-ebs-2019-08-16
machineType: t3.micro
maxPrice: "0.0120"
maxSize: 7
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: spot-ig
on-demand: "false"
role: Node
subnets:
- eu-west-1a
- eu-west-1b
- eu-west-1c

As mentioned before, every instance type in each availability zone has its own capacity pool. To increase the chances of being allocated spot capacity as well as to ensure fewer interruptions, we will create another instance group with another instance type.

kops create ig spot-ig-2

Change machine type to c5.large

Add the following under spec.

maxPrice: "0.0410"
maxSize: 7
minSize: 1

maxPrice is again based on the pricing history for the last 3 months as displayed in the EC2 console.

Add the following nodeLabel

nodeLabels:
on-demand: "false"

Here is the complete configuration:

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2019-09-13T13:23:35Z
labels:
kops.k8s.io/cluster: demo.cloudchap.cf
name: spot-ig-2
spec:
image: kope.io/k8s-1.13-debian-stretch-amd64-hvm-ebs-2019-08-16
machineType: c5.large
maxPrice: "0.0410"
maxSize: 7
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: spot-ig-2
on-demand: "false"
role: Node
subnets:
- eu-west-1a
- eu-west-1b
- eu-west-1c

As mentioned before, we want to support the spot instance group already created with an on-demand one that can take up the slack from spot instance interruptions. We will use the “nodes” instance group already created by kops as the on-demand instance group.

Since we want to use the on-demand instance group as a backup, we will taint the EC2 instances in it with PreferNoSchedule. Taints allow us to mark nodes so that the kubernetes scheduler avoids them when making scheduling decisions for pods. The PreferNoSchedule taint is a softer version of the NoSchedule taint. It tries to avoid the tainted nodes but is not required.

kops edit ig nodes

Update the maxSize and minSize

maxSize: 7
minSize: 1

And add the following nodeLabel

nodeLabels:
on-demand: "true"

And taint the instances by adding

taints:
- on-demand=true:PreferNoSchedule

Here is the complete instance group configuration:

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2019-09-13T14:16:11Z
labels:
kops.k8s.io/cluster: demo.cloudchap.cf
name: nodes
spec:
image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-06-21
machineType: t2.medium
maxSize: 7
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: on-demandig
on-demand: "true"
role: Node
subnets:
- eu-west-1a
- eu-west-1b
- eu-west-1c
taints:
- on-demand=true:PreferNoSchedule

Update the cluster to review the changes

kops update cluster demo.cloudchap.cf

Add --yes to apply the changes

kops update cluster demo.cloudchap.cf --yes

Verify that the instance groups have been created

$ kops get ig
Using cluster from kubectl context: devopspaulzhao.com

NAME ROLE MACHINETYPE MIN MAX ZONES
master-eu-west-1a Master t2.medium 1 1 eu-west-1a
nodes Node t2.medium 1 7 eu-west-1a,eu-west-1b,eu-west-1c
spot-ig Node t3.micro 1 7 eu-west-1a,eu-west-1b,eu-west-1c
spot-ig-2 Node m5.large 1 7 eu-west-1a,eu-west-1b,eu-west-1c

We can also see the spot requests that are initiated in the AWS EC2 console

Spot instances

Since we have two spot instance groups with a minSize of 1, we can see two spot requests. As the instance group scales and the number of instances increases, the spot requests initiated will also increase.

Verify the updated corresponding Autoscaling groups on AWS

Autoscaling groups

Notes: No second node can have same instance type as master node. Since master node is t2.medium instance type, then both spot-ig and spot-ig-2 can’t be assigned with t2 related instance type

Kops: Deploying the Cluster Autoscaler for Multiple Instance Groups

Now that we have deployed our cluster, let’s integrate the cluster autoscaler. The cluster autoscaler will automatically increase or decrease the size of our kubernetes cluster based on the presence of pending pods and the utilisation of individual nodes (instances).

Make a directory and download cluster autoscaler

$ mkdir autoscaler
$ cd autoscaler/
$ git clone https://github.com/kubernetes/autoscaler.git

It will spin up instances if there are pending pods that could not be scheduled because of insufficient resources on the already existing nodes. The cluster autoscaler will also decommission instances if they are consistently under-utilised and will schedule the pods from those instances on other ones.

To deploy the cluster autoscaler, we first need to create an IAM policy and attach it to the instance group we want to autoscale.

Create a ig-policy.json file locally and copy the following code into it

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup"
],
"Resource": "*"
}
]
}

Create the policy

$ aws iam create-policy --policy-name ig-policy --policy-document file://ig-policy.json

You will see the following output

$ aws iam create-policy --policy-name ig-policy --policy-document file://ig-policy.json 
{
"Policy": {
"PolicyName": "ig-policy",
"PolicyId": "ANPAWYH7TZJJSIQNKEJZ2",
"Arn": "arn:aws:iam::464392538707:policy/ig-policy",
"Path": "/",
"DefaultVersionId": "v1",
"AttachmentCount": 0,
"PermissionsBoundaryUsageCount": 0,
"IsAttachable": true,
"CreateDate": "2020-06-16T23:38:11+00:00",
"UpdateDate": "2020-06-16T23:38:11+00:00"
}
}

Attach the policy to the nodes.demo.cloudchap.cf role by inserting the policy arn from the output above into the following command

aws iam attach-role-policy --policy-arn arn:aws:iam::209925384246:policy/ig-policy --role-name nodes.demo.cloudchap.cf 

Next add the following cloudlabels to all three instance groups

spec:
cloudLabels:
k8s.io/cluster-autoscaler/enabled: ""
k8s.io/cluster-autoscaler/node-template/label: ""

Now we are ready to deploy the cluster autoscaler. Here is the yaml file for the autoscaler. Edit the file and input the names of the instance groups as well as the correct values for the min and max sizes as shown below:

command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --nodes=1:7:nodes.demo.cloudchap.cf
- --nodes=1:7:spot-ig.demo.cloudchap.cf
- --nodes=1:7:spot-ig-2.demo.cloudchap.cf

Also add the correct certificate path under hostPath

hostPath:
path: "/etc/ssl/certs/ca-bundle.crt"
path: "/etc/ssl/certs/ca-certificates.crt"

Deploy the cluster autoscaler using

$ kubectl apply -f cluster-autoscaler-multi-asg.yaml

This will spin up a cluster autoscaler deployment in the kube-system namespace.

Verify that the cluster autoscaler has been deployed.

$ kubectl get pods -l app=cluster-autoscaler -n kube-system
NAME READY STATUS RESTARTS AGE
cluster-autoscaler-5b5844cd75-lhrc6 1/1 Running 0 9s

View the logs for the autoscaler pod

kubectl logs -f pod/cluster-autoscaler-69b696b7df-rb8l6 -n kube-system

Now that the cluster autoscaler is deployed on our cluster let’s scale our app replicas to verify the auto scaling behaviour.

kubectl scale deployment frontend --replicas=40 -n production

Verify scaling by using command line below:

$ kubectl describe deployment frontend -n production
Name: frontend
Namespace: production
CreationTimestamp: Tue, 16 Jun 2020 20:05:26 -0400
Labels: app=frontend
Annotations: deployment.kubernetes.io/revision: 1
Selector: app=frontend
Replicas: 40 desired | 40 updated | 40 total | 40 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=frontend
Containers:
nginx:
Image: nginx:1.14.2
Port: <none>
Host Port: <none>
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: frontend-567b8ff797 (40/40 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 23m deployment-controller Scaled up replica set frontend-567b8ff797 to 1
Normal ScalingReplicaSet 23m deployment-controller Scaled up replica set frontend-567b8ff797 to 40

This completes the deployment of a Kubernetes cluster leveraging both on-demand and spot instances as part of separate instance groups.

Next we will deploy a Kubernetes cluster that leverages EC2 spot instances as part of single a single mixed instance group.

Kops: Kubernetes Cluster with Single Mixed Instance Group

Mixed instance groups leverage multiple instance types and purchase options. As of version v1.14.x, the cluster autoscaler also supports mixed instance groups. These instances however, need to have the same CPU and memory resources for the cluster autoscaler to function correctly.

Mixed instance groups allows us to diversify our kubernetes cluster and take advantage of multiple spot pools as part of the same instance group. Leveraging multiple spot pools increases the chances of being allocated spot capacity, as well as reducing interruptions.

Let us now move on to the deployment.

We will use the same cluster we deployed earlier in the guide.

First, create a new mixed instance group

kops create ig mixed-ig

Add maxPrice and change the maxSize and minSize

maxPrice: "0.04"
maxSize: 20
minSize: 1

Also add the mixed instance policy

mixedInstancesPolicy:
instances:
- t2.medium
- c5.large
- a1.large
onDemandAboveBase: 0
onDemandBase: 0
spotInstancePools: 3

onDemandBase is the minimum instance group capacity that we want to be provisioned as on-demand instances. The base capacity is provisioned first. Since we have set it to 0 this means that our instance group will have no base on-demand instances.

onDemandAboveBase is the percentage of instances above base that will be provisioned as on-demand instances. Setting it to 0 means that any additional capacity or instances launched will be spot instances.

The three instance types we have chosen for the instance group all have similar CPU and memory resources. This means that we can safely use the cluster autoscaler with it.

Here is the complete configuration of the mixed instance group

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: null
labels:
kops.k8s.io/cluster: demo.cloudchap.cf
name: mixed-ig
spec:
image: kope.io/k8s-1.13-debian-stretch-amd64-hvm-ebs-2019-08-16
machineType: t3.micro
maxPrice: "0.04"
maxSize: 20
minSize: 1
mixedInstancesPolicy:
instances:
- t2.medium
- c5.large
- a1.large
onDemandAboveBase: 0
onDemandBase: 0
spotInstancePools: 3
nodeLabels:
kops.k8s.io/instancegroup: mixed-ig
role: Node
subnets:
- eu-west-1a
- eu-west-1b
- eu-west-1c

Update the cluster to review the new resources that will be created

$ kops update cluster demo.cloudchap.cf

Apply the changes

$ Kops update cluster demo.cloudchap.cf --yes$ kops rolling-update cluster
Using cluster from kubectl context: devopspaulzhao.com

NAME STATUS NEEDUPDATE READY MIN MAX NODES
master-eu-west-1a Ready 0 1 1 1 1
mixed-ig Ready 0 1 1 20 1
nodes Ready 0 1 1 7 1
spot-ig Ready 0 1 1 7 1
spot-ig-2 Ready 0 1 1 7 1

Let us now deploy the cluster autoscaler for the mixed instance group.

Kops: Deploying the Cluster Autoscaler for a single Mixed Instance Group

Edit this yaml file for the cluster autoscaler and add the following under Hostpath

path: "/etc/ssl/certs/ca-certificates.crt"

Also enter the correct values for the minSize and maxSize and the name of the instance group. Optionally add — — skip-nodes-with-system-pods=false.

command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --nodes=1:7:mixed-ig.demo.cloudchaf.cf
- --skip-nodes-with-system-pods=false

Deploy the cluster autoscaler using

$ kubectl apply -f cluster-autoscaler-one-asg.yaml

Verify that the cluster autoscaler is running in the kube-system namespace

$ kubectl get pods -l app=cluster-autoscaler -n kube-system
NAME READY STATUS RESTARTS AGE
cluster-autoscaler-749b9ccd48-dkh8v 1/1 Running 0 16m

View the logs for the autoscaler pod

$ kubectl logs -f pod/cluster-autoscaler-7bc84c657-s9sfj -n kube-system

Scale app to verify the auto scaling behaviour.

$ kubectl scale deployment frontend --replicas=90 -n production

View the cluster autoscaler logs

$ kubectl logs -f cluster-autoscaler-7bc84c657-s9sfj -n kube-system

To verify deployments

$ kubectl describe -n production deployments
Name: frontend
Namespace: production
CreationTimestamp: Tue, 16 Jun 2020 20:05:26 -0400
Labels: app=frontend
Annotations: deployment.kubernetes.io/revision: 1
Selector: app=frontend
Replicas: 90 desired | 90 updated | 90 total | 90 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=frontend
Containers:
nginx:
Image: nginx:1.14.2
Port: <none>
Host Port: <none>
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: frontend-567b8ff797 (90/90 replicas created)
Events: <none>

This concludes the process of deploying a Kubernetes cluster leveraging spot instances using Kops.

Next we will deploy our cluster using EKS.

EKS: Deploying Kubernetes Cluster Leveraging EC2 Spot Instances

In this section we will review the process of deploying a Kubernetes cluster leveraging EC2 spot instances using EKS. As we did with Kops, we will provision multiple node groups each with its own instance type, as well as a single mixed node group. We will also deploy the cluster autoscaler for both scenarios.

EKS: Deploy Kubernetes Cluster with Multiple Spot and On-demand Node Groups

In this section we walkthrough the deployment of a Kubernetes cluster with multiple node groups. We will create 3 node groups. Two of these node groups will leverage EC2 spot instances while the remaining one will leverage on-demand instances.

We assume that you have already installed AWS CLI tools, eksctl, kubectl and the AWS CLI tools.

Create a kubernetes cluster using eksctl.

eksctl create cluster \
--name demo-eks-cluster \
--nodegroup-name nodes \
--node-type t2.medium \
--nodes-min 1 \
--nodes-max 1 \

This will create the Kubernetes control plane, managed by AWS as well as an on-demand node group called nodes.

Verify that the node groups have been created

$ eksctl get nodegroup --cluster floral-painting-1592375199
CLUSTER NODEGROUP CREATED MIN SIZE MAX SIZE DESIRED CAPACITY INSTANCE TYPE IMAGE ID
floral-painting-1592375199 ng-37dfc6fe 2020-06-17T06:39:05Z 2 2 0 m5.large ami-0ee0652ac0722f0e3

We can also see the corresponding autoscaling group in the AWS console. Since the master node group is managed by AWS, it does not show up on the AWS console.

Autoscaling group

Let’s now create the 3 node groups. We will use this cloudformation template to create the node groups. Clone the template locally and make the following changes:

Change the SpotNode1InstanceType to c5.large

SpotNode1InstanceType: 
Description: EC2 instance type for the spot instances.
Type: String
Default: c5.large

Change the OnDemandNodeInstanceType to t3.medium

OnDemandNodeInstanceType: 
Description: EC2 instance type for the node instances.
Type: String
Default: t3.medium

Increase the default NodeAutoScalingGroupMaxSize to 7

NodeAutoScalingGroupMaxSize: 
Type: Number
Description: Maximum size of Node Group ASG.
Default: 7

Head over to the cloudformation section of the AWS console and click on create stack.

Upload the updated ‘amazon-eks-nodegroup-with-spot.yaml’ file and click Next.

Template provided for cloudformation

Before moving to next page, let’s find for NodeInstanceRole in the CloudFormation we created. Click Physical ID, you will locate Role ARN under IAM service

NodeInstanceRole
Role arn under IAM

Apart from this, we must provide keypair for the EC2 instance we created as well. Under EC2 instance page find Key Pair on the left navigation bar

Key pair under EC2

Create the key pair as .pem file

Key pair creation

Enter the correct values for ‘Stack Name’ and ‘Cluster Name’. Choose the correct ‘ClusterControlPlaneSecurityGroup’, ‘VpcId’ and Subnets from the drop down list. Lastly enter ‘ami-059c6874350e63ca9’ under ‘NodeImageId’. In the next screen optionally enter the tag key and value and click on create stack.

Once the stack creation is complete, note down the ARN of the ‘NodeInstanceRole’ resource created by the cloudformation stack. In our case this is the ‘eksctl-demo-eks-cluster-nodegroup-NodeInstanceRole-GJZ80VTSEPGT’ role.

The stack will create three autoscaling groups on AWS. It will also label the spot and on-demand instances spun up as part of the autoscaling groups with ‘lifecycle=Ec2Spot’ and ‘lifecycle=OnDemand2’ labels respectively.

Verify that the instance groups have been created

Security groups

The ‘SpotNodeGroup1’ and ‘SpotNodeGroup2’ autoscaling groups exclusively leverage spot instances while the ‘OnDemandNodeGroup’ leverages on-demand instances.

We can see the spot requests initiated by these autoscaling groups on the AWS console:

Spot Requests

As we add more instances to these autoscaling groups, the number of spot requests initiated will also increase.

Since we are requesting spot capacity in separate spot pools for both node groups, the chances of being allocated spot capacity is higher. There is also a lower chance of both spot pools running out of capacity at the same time, which will result in fewer interruptions.

Next clone the following ConfigMap yaml file locally and enter the role ARN copied earlier.

Here is what the configuration looks like:

apiVersion: v1 
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- rolearn: arn:aws:iam::209925384246:role/eks-node-groups-NodeInstanceRole-4H2F9TU6FGB0
username: system:node:
groups:
- system:bootstrappers
- system:nodes

Notes: Previously, we have found arn role for this, please refer to NodeInstanceRole

Apply using kubectl

kubectl apply -f aws-cm-auth.yaml

This completes the deployment of our EKS cluster with multiple node groups. Next we will deploy the cluster autoscaler for all 3 of the node groups we created.

EKS: Deploy Cluster Autoscaler for Multiple EKS Node Groups

We created the IAM policy required for the cluster autoscaler deployment earlier. Here is a quick recap:

Copy the following code into a json file locally.

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup"
],
"Resource": "*"
}
]
}

Create the policy

$ aws iam create-policy --policy-name ig-policy-1 --policy-document file://ig-policy-1.json
{
"Policy": {
"PolicyName": "ig-policy-1",
"PolicyId": "ANPAWYH7TZJJS6IZG6MRK",
"Arn": "arn:aws:iam::464392538707:policy/ig-policy-1",
"Path": "/",
"DefaultVersionId": "v1",
"AttachmentCount": 0,
"PermissionsBoundaryUsageCount": 0,
"IsAttachable": true,
"CreateDate": "2020-06-17T18:13:53+00:00",
"UpdateDate": "2020-06-17T18:13:53+00:00"
}
}

Attach the policy to the ‘eksctl-demo-eks-cluster-nodegroup-NodeInstanceRole-GJZ80VTSEPGT’ role by inserting the policy ARN generated in the output from the earlier step:

aws iam attach-role-policy --policy-arn arn:aws:iam::209925384246:policy/ig-policy ## Change to your generated arn above --role-name eksctl-demo-eks-cluster-nodegroup-NodeInstanceRole-1BC0W2OHCHMDN ## Find this autoscaling group in AWS console

Notes: Previously, we configured kubectl config, so now we need to switch back to our working cluster. Otherwise, cluster-autoscalser will fail. The Asterisk indicates which config is currently in use. For me, I had to swich from the first name space to the second one.

$ kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
adminuser@floral-painting-1592375199.us-east-1.eksctl.io floral-painting-1592375199.us-east-1.eksctl.io adminuser@floral-painting-1592375199.us-east-1.eksctl.io
* devopspaulzhao.com devopspaulzhao.com devopspaulzhao.com
docker-desktop docker-desktop docker-desktop
docker-for-desktop docker-desktop docker-desktop
minikube minikube minikube

For switching config

$ kubectl config use-context <my-cluster-name>

Deploy the cluster autoscaler by cloning this yaml file and inserting the names of the node groups as well as the correct values for the min and max sizes as shown below:

command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --nodes=1:7:eks-node-groups-SpotNode2Group-XJ6P3SWK5U99
- --nodes=1:7:eks-node-groups-SpotNode1Group-M1T70Z19O9JA
- --nodes=1:7:eks-node-groups-OnDemandNodeGroup-1UH2DWM5E304C

Deploy the cluster autoscaler

kubectl apply -f cluster-autoscaler-multi-asg.yaml

Verify that the cluster autoscaler has been deployed.

$ kubectl get pods -l app=cluster-autoscaler -n kube-system
NAME READY STATUS RESTARTS AGE
cluster-autoscaler-87764bddb-lx2rl 1/1 Running 0 12m

View the logs for the autoscaler pod

kubectl logs -f pod/cluster-autoscaler-596c9b9cbd-twxp7 -n kube-system

Scale app to verify the cluster autoscaler behaviour.

kubectl scale deployment frontend --replicas=60 -n production

As we scale our application, the cluster autoscaler detects these pending pods and creates a plan to scale up the cluster.

To verify clusters

$ kubectl -n production describe deployment frontend
Name: frontend
Namespace: production
CreationTimestamp: Wed, 17 Jun 2020 00:22:38 -0400
Labels: app=frontend
Annotations: deployment.kubernetes.io/revision: 1
Selector: app=frontend
Replicas: 60 desired | 60 updated | 60 total | 60 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=frontend
Containers:
nginx:
Image: nginx
Port: <none>
Host Port: <none>
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: frontend-5d5f67f777 (60/60 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 68s deployment-controller Scaled down replica set frontend-5d5f67f777 to 60

EKS: Deploy Kubernetes Cluster with Single Mixed Node Group

Next we will deploy a Kubernetes cluster with a single mixed node group. Mixed node groups allow us to spin up multiple instance types as part of the same node group. They can also be used to mix spot and on-demand instances. Once we have the mixed node group up and running, we will also deploy the cluster autoscaler.

We will use the same EKS Kubernetes cluster we created earlier.

To create the mixed node group, clone this yaml file locally and make the following changes as shown in the configuration below:

apiVersion: eksctl.io/v1alpha5 
kind: ClusterConfig
metadata:
name: demo-eks-cluster
region: eu-west-1
nodeGroups:
- name: mixed-ng
minSize: 1
maxSize: 20
instancesDistribution:
maxPrice: 0.04
instanceTypes: ["c5.large", "t2.medium", "a1.large"]
onDemandBaseCapacity: 0
onDemandPercentageAboveBaseCapacity: 0
spotInstancePools: 3

Notes: Find out your EKS Cluster in AWS console. Here the region must be in the region where your EKS cluster is located. Otherwise, the deployment will fail.

onDemandBaseCapacity is the minimum node group capacity that will be provisioned as on-demand instances, onDemandPercentageAboveBaseCapacity is the percentage of instances above base that will be provisioned as on-demand instances. Setting both to zero means that our cluster will exclusively leverage spot instances.

We have also chosen three instance types with similar CPU and memory, making it safe to deploy the cluster autoscaler on top.

We have also set the maxPrice based on spot pricing history in the AWS console.

Create the mixed node group

eksctl create nodegroup -f mixed-ng.yaml

In the AWS console, a new CloudFormation stack is created.

New cloudformation stack created

Let us now deploy the cluster autoscaler for the mixed node group.

EKS: Deploy Cluster Autoscaler for Multiple EKS Node Groups

To deploy the cluster auto scaler, clone this yaml file locally and update the name of the mixed node group as well as the min and maxSize.

command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --nodes=1:20:eksctl-demo-eks-cluster-nodegroup-mixed-ng-NodeGroup-LT7L1JFZK2Q6
- --skip-nodes-with-system-pods=false

Deploy the cluster autoscaler:

kubectl apply -f cluster-autoscaler-one-asg.yaml

Verify that the cluster autoscaler is running in the kube-system namespace

$ kubectl get deployments -n production
NAME READY UP-TO-DATE AVAILABLE AGE
frontend 90/90 90 90 16h

View the logs for the autoscaler pod

kubectl logs -f pod/cluster-autoscaler-68b9d674f5-jkcmr -n kube-system

Scale app to verify the auto scaling behaviour.

kubectl scale deployment frontend --replicas=90 -n production

Verify the scaling

$ kubectl describe deployments -n production
Name: frontend
Namespace: production
CreationTimestamp: Wed, 17 Jun 2020 00:22:38 -0400
Labels: app=frontend
Annotations: deployment.kubernetes.io/revision: 1
Selector: app=frontend
Replicas: 90 desired | 90 updated | 90 total | 80 available | 10 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=frontend
Containers:
nginx:
Image: nginx
Port: <none>
Host Port: <none>
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: frontend-5d5f67f777 (90/90 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 42m deployment-controller Scaled down replica set frontend-5d5f67f777 to 60
Normal ScalingReplicaSet 5m57s (x2 over 14h) deployment-controller Scaled up replica set frontend-5d5f67f777 to 90

So this concludes the deployment of the cluster autoscaler in our Kubernetes cluster leveraging a single mixed node group.

Conlustion

First and foremost, let’s us make sure that we will not be charged by AWS more than needed by either terminating or deleting resources since it is a lab environment.

Let’s do it systemtically! Since most of AWS resources are actually region-based. And we deployed all of our resources in two regions — us-east-1 as well as eu-west-1.

Let’s move to us-east-1 first. Check out EC2 instances. But before that, we have to delete all autoscaling groups we previously created. Otherwise, as we delete our EC2 instances, new instances will be built up.

We delete autocaling groups

Autoscaling groups

Cleared autoscaling group

Emptied autoscaling group

Then we can move on to EC2 and terminate all EC2 instances (remember EC2 is region specific, so we need to delete all EC2 instances provisioned in different regions we created).

EC2 instances cleared

Now it is good to double confirm our spot requests are all closed

Spot requests closed

Here I’d like to higlight EIP, which is easliy being ignored as charged service by AWS. Make sure EIP is released unless it is assocaited with EC2. Since we’ll be charged for using EC2 instances, it means we have to release EIP for not being charged. When releasing EIP, you may find that you’re not allowed to do so due to dependency issues. It was actually associated with NAT getway in this lab setting. So let us delete NTA getway, then move on to EIP.

NAT gateway deleted
EIP released

We’ll check out our CloudFormation now since we applied stack to build up a great number of resources in this lab.

Deletion of CloudFormation

CloudFormation stack is emptied

CloudFormation stack deleted

You may encounter failed CloudFormation when deleting it. So that we may delete its dependencies prior to deleting CloudFormation stacks.

Lastly, we will remove EKS cluster

EKS cluster deletion

EKS emptied

EKS cluster removed

As we conclude our resources in us-east-1 region, we will do the same in eu-west-1

Now, as we cleared regional resources, we’ll be heading towards global resoucre S3 buckets. (Keep in mind: S3 bucket is not charged. However, contents in S3 bucket are charged. So that we need to clean up all contents in all S3 buckets created.

Resources deletion in S3 buckets

After dealing with resource deletion, we will focus on infrastructures.

There are three layers that we touch upon

Layer 1: Deploying Kubernetes Cluster using kops, kubectl and the AWS CLI tools locally

  1. Deploying Kubernetes Cluster with Multiple Spot and On-demand Instance Groups
  2. Deploying Kubernetes Cluster with Single Mixed Instance Group

Layer 2: Deploying the Cluster Autoscaler for Multiple Instance Groups

  1. Deploying the Cluster Autoscaler for a single Mixed Instance Group
  2. Deploying the Cluster Autoscaler for Multiple Instance Groups

Layer 3: EKS: Deploying Kubernetes Cluster Leveraging EC2 Spot Instances

  1. EKS: Deploy Kubernetes Cluster with Multiple Spot and On-demand Node Groups
  2. EKS: Deploy Cluster Autoscaler for Multiple EKS Node Groups
  3. EKS: Deploy Kubernetes Cluster with Single Mixed Node Group
  4. Deploy Cluster Autoscaler for Multiple EKS Node Group

By doing this project, we have successfully deployed a highly available and resilient Kubernetes cluster that leverages spot instances as worker nodes using both Kops and EKS. The cluster takes advantage of multiple spot pools across instance types and availability zones.

--

--

Paul Zhao
Paul Zhao Projects

Amazon Web Service Certified Solutions Architect Professional & Devops Engineer