How To Create Your Kubernetes Cluster on Amazon EKS With Terraform Like a Boss in 7 Steps

Nassim Kebbani
The Startup
Published in
9 min readOct 11, 2020

When you want to deploy a Kubernetes cluster on AWS, you do that with the Amazon EKS service. Amazon EKS is a big service and it’s a little hard to get started with. It offers you to create a managed Kubernetes control plane, and once it’s up, you add EC2 instances as Kubernetes worker nodes to run pods on.

Let’s automate Amazon EKS ❤

There are actually multiple ways to setup an EKS cluster:

  • With Terraform or CloudFormation (the IaC way)
  • With the eksctl command line-tool (the CLI way)
  • By clicking on the buttons on the visual AWS console (the loser way)

As proud Hashicorp-Fags ❤, we’re going to discover how to create an Amazon EKS cluster with Terraform only.

What we will build?

Here, we are going to create a fully automated, and autoscaled Kubernetes 1.17 cluster on Amazon EKS with Terraform version >0.12. This way we’ll enjoy both EKS and Terraform latest features.

Here are all the 7 steps we are going to cover. They are all the hot points to be aware of when using Amazon EKS:

  • 1) Creating a big enough VPC to host your control plane and worker nodes
  • 2) Creating the EKS cluster with CloudWatch Logs groups enabled to monitor control plane components
  • 3) Enabling both public and private control plane endpoints
  • 4) Configuring Kubectl to handle the mandatory IAM authentication layer
  • 5) Enabling AWS IAM role to Kubernetes’ service accounts integration
  • 6) Adding Autoscaled Worker Nodes with NodeGroup and Cluster Autoscaler
  • 7) Deploying an app to push a file to Amazon S3.

Note: The result of this post might cost you a few dollars on your AWS bill.
Note2: The final result of all of this is available on GitHub.

Getting your cluster up and running

The very first thing to do is to define an arbitrary name for your EKS cluster. Ours is going to be named “my-eks-kluster”.

Step 1) Creating a big enough VPC to host your control plane and worker nodes

Your VPC is at the heart of your EKS cluster since it’s going to be deployed on. Here are some concerns:

  • EKS clusters are super IP consuming because each pod is going to have an IP from the VPC. Hence, your VPC has to be big enough. Use a CIDR block of /16 preferably.
  • Prefer deploying one EKS cluster per VPC (because of the reason above).
  • Your VPC needs to be split into subnets. In most cases, we mix both public and private subnets: public ones for NAT Gateways + internet-facing load balancers, and private ones for worker nodes running pods. If your EKS cluster is meant to be private only, then you will need a private connection such as VPN or Direct Connect to access your cluster. Avoid doing public only clusters.
  • Your VPC and subnets has to hold specific tags, otherwise, your EKS cluster won’t work at all.
  • DNS Support and DNS hostname has to be enabled at the VPC level if you plan to deploy worker node in private subnets (which is our case).
  • MapPublicIpOnLaunch must be set to true at the VPC level if you plan to deploy your worker node in public subnets (not our case).

Let’s address all these needs. I’ll create a VPC with a block range of /16. With 3 public and 3 private subnets. I’m also going to tag the VPC and subnets to be compliant with EKS requirements. Forgetting these tags is a common source of error since worker nodes won’t be able to join if these tags are missing.

  • The VPC needs this tag (only for EKS < 1.15):
key = "kubernetes.io/cluster/<cluster-name>"
value = "shared"
  • The Subnets needs these tags (no matter what EKS version you use):
key = "kubernetes.io/cluster/<cluster-name>"
value = "shared"

I’m using the terraform-aws module. Here is how my Terraform code that creates a VPC that addresses all these requirements.

https://gist.github.com/NassK/12dc078a90224e076184a6fd86db1be6

Step 2) Creating the EKS cluster with CloudWatch Logs groups enabled to monitor control plane components

Now that our VPC is up, we can create the Kubernetes cluster itself on top of the Amazon EKS service. The TF resource to do that is named aws_eks_cluster. Here is everything you should be aware of:

  • The cluster will need to be aware of all the subnets it’ll span across (both public and private).
  • Amazon EKS requires you to set an IAM role for the cluster control plane
  • That IAM role should hold enough permission to write to CloudWatch Logs (for control plane components logging) and other policies like creating and tagging EC2 resources (for managed worker nodes)
  • AWS has two managed policy to attach to create this IAM role easily: arn:aws:iam::aws:policy/AmazonEKSServicePolicy and arn:aws:iam::aws:policy/AmazonEKSClusterPolicy.
  • You can optionally define a KMS CMK to manage Kubernetes Secrets’s encryption but we won’t do that here.

Here is the updated Terraform code.

https://gist.github.com/NassK/d40674f227aa0377418c091a65ebec74

Our cluster is now available in the AWS console

Step 3) Enable both public and private endpoints

Amazon EKS offers you to enable a public and a private HTTPS endpoints to interact with your cluster (with the kube-apiserver created).

  • Public endpoint will be used by you so that you can interact with your cluster from the internet (not needed if your cluster is private only)
  • Private endpoint will be used by your worker nodes so that they can interact with the control plane privately

By default, the private endpoint is not enabled. You should enable it so that your worker nodes will interact with the control plane without going through the public internet. Thankfully, it’s just a boolean to set to true.

Here is the updated Terraform. (If you want to take it further, restricting Security Groups would be the next step).

Both public and private endpoints are available. Your kubectl call will reach the cluster from the public endpoint whereas the worker nodes will reach the cluster from the private endpoint without going through the public internet

Step 4) Configuring Kubectl to handle the mandatory IAM authentication layer

This one is a big part: next step is to install and configure Kubectl and AWS CLI.

Now let’s configure Kubectl. Be aware that Kubernetes does not manage authentication out of the box: with vanilla Kubernetes, we proceed by using a client certificate ; with EKS, there is this additional IAM authentication layer.

When using Amazon EKS, you need to authenticate against Amazon IAM prior to call your Kubernetes cluster. Each kubectl call will first authenticate against AWS IAM to retrieve a token, and then, will hit the EKS cluster. Missing this authentication step will result in all your kubectl call ending up with a 401 response. That token can be retrieved by calling aws eks get-token, and we can add some configurations to the kubeconfig to call this command every time.

Good news is that you can generate a kubeconfig file right from AWS CLI, let’s hook all of this in a Terraform null_resource.

Here is the updated Terraform code:

https://gist.github.com/NassK/370dbabef61e05345fd088013a9d1fa3

After you apply this Terraform code, you will get a working Kubernetes cluster and a properly configured kubectl command line on your local setup.

Step 5) Enabling AWS IAM role to Kubernetes’ service accounts integration

Since the release of Amazon EKS 1.13, we can give an IAM role to a Kubernetes’ service account: this way, each pods can have its own IAM role and IAM permission scheme to interact with AWS API. You must do this if your pods needs to perform AWS operations (e.g. uploading a file to a S3 Bucket, …) and it’s a requirement if you want to be compliant with the least priviledge principle.

When linking IAM roles to a service account, IAM policy are granted to the service account, and thus, only pods that have the service account will inherit the IAM permissions attached to it. Each pod can then have its own set of permission, and not the one inherited from the EC2 worker node which would cause a big problem in term of permission isolation.

You enable this by creating an OpenID provider by using the Terraform resource aws_iam_openid_connect_provider. This one is a pure IAM resource and not an EKS/Kubernetes thing.

Here is the updated Terraform code.

https://gist.github.com/NassK/d7358124125f5d97f990a63ef8a3d17d

6) Creating an NodeGroup to add your worker nodes

Adding your worker node is achieved through EKS NodeGroup: it mostly consist of managed autoscaling groups of EC2 instances integrated within the Amazon EKS console.

Though NodeGroup are Autoscaling Groups, it’s better to use the Cluster Autoscaler to manage scaling activities in a Kubernetes way (and not in an AWS way). The Cluster Autoscaler is a container (a pod) to launch on your Kubernetes. Once running, it’s smart enough to interface with the cloud provider it’s running on: in our case it’ll detect the cluster is based on EKS and will run the proper AWS API calls to interact with the NodeGroup and scale up and down your cluster. Cluster Autoscaler can be installed by applying a YAML file available.

Back to EKS, the first step is to create NodeGroups, then to install the Cluster Autoscaler. NodeGroup can be created following two architectures:

  • One NodeGroup per AZ, mandatory if you run stateful apps using EBS volumes. Recommended by Cluster Autoscaler guidelines. We will proceed this way.
  • One big NodeGroup that spans the AZs, good if your apps are stateless but not recommended to use due AZRebalance activity: an instance might be terminated by the ASG and not by the Cluster Autoscaler.

Before creating the NodeGroups, there are additional requirements to be aware of:

  • You need to apply the aws-auth ConfigMap in the kube-system namespace and install a CNI plugin to get your worker node joining the cluster properly.
  • NodeGroup/Worker Nodes requires an IAM role with 3 AWS managed policies: arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy, arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy, arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy.
  • AWS offers you an AMI to manage your worker node but you can bake your own, AWS is long to update its AMI.

Here is the updated code including the creation of the NodeGroup(s), the creation of the aws-auth configmap, the deployment of the Cluster Autoscaler, and the deployment of the Calico CNI (beside NodeGroup creation, all of these are achieved through null_resource running raw kubectl).

After step 6, we have 3 workers nodes in private subnets in the Ready state. Cluster autoscaler is installed too. We are ready to deploy pods on our cluster.

https://gist.github.com/NassK/e552bfd44b5ceae3372de5bc82a6f357

7) Deploying a pod

Here we are going to deploy an AWS CLI pod that will run an EC2:DescribeInstances operation. The result will be stored in a file called results.json which will then be uploaded to an Amazon S3 bucket. This scenario implies interactions with AWS services and will force us to create a dedicated IAM role and to assign it to a ServiceAccount our Pod will use.

Here is the updated Terraform code with the pod + service account YAML definition. Please note the creation of an S3 bucket is part of the TF code as well as the creation of a role and an IAM policy to allow ec2 describe-instances and s3 put-object. The IAM role can be assumed by the service account only.

After deploying the pod, we check is results.json was properly uploaded to our S3 bucket, meaning the s3 put-object operation was successfully ran and not failed in a 403 access denied.

Our results.json is on the S3 bucket, the Pod was launched and has successfully used to IAM role.

And finally, let’s check that the results.json file really contains the output of the aws ec2 describe-instances command.

The results.json file contains the output of aws ec2 describe-instances and results.json is on S3, which means the pod is perfectly configured.

Everything works!
Please do remember to create an IAM role + a Service Account on EKS each time you deploy a new application that will need to interact with an AWS service to enforce least priviledge principle.

Here is the last gist!

https://gist.github.com/NassK/47902760b5932947e2bbdd17ce145bee

Conclusion

I created a repo and shared it on GitHub to sum up all of this. It’s opensource so do whatever you want with it. Though my code works, I don’t recommend you to use it for your production workloads. Relying on a popular Terraform module is a better idea if you want to go prod because this one has room for improvements:

  • Enable private VPC endpoint to retrieve Docker image in an ECR
  • Improving the aws-auth ConfigMap generation
  • Adding security groups
  • Configure ALB Ingress Controller
  • Enable AWS KMS to encrypt your Kubernetes Secrets
  • Implement Fargate mode

But anyway, that post should be a good start: i’m happy if it helps you in your Amazon EKS journey. By the way you can test my app CloudAssessor available for free on AWS Marketplace, We were tired to use AWS Config rules so we decided to build our own app that automates the AWS Well-Architected audit against your actual resources telling you where you misconfigured AWS. So may be you’ll discover one tip to improve your EKS setup too.

Lastly, I’m on LinkedIn, so don’t hesitate to add me if you want to discuss Kube/AWS/cloud/…
https://www.linkedin.com/in/nassim-kebbani/

Hope your mom will be proud of your Kubernetes setup! 👵

Bye! ❤

--

--

Nassim Kebbani
The Startup

DevOps engineer, writing about Kubernetes and AWS stuffs, co-founder @ cloudassessor.com