What AWS Doesn’t Tell You About Their Managed Kubernetes Service (EKS)

Published in

Blu Flame Technologies

10 min readJul 27, 2021

So, you want to run Kubernetes in AWS and it looks like Amazon’s Elastic Kubernetes Service is your golden ticket… Not so fast. There are a ton of gotchas. Let’s walk through a few of those gotchas as we provision a private EKS cluster in AWS.

A Concise Description Of EKS

EKS is AWS’s managed Kubernetes service. It’s an adaptation of Kubernetes for AWS. That means it works just like your other Kubernetes installations, except when it doesn’t (spoiler: there are a lot of those exceptions). AWS uses special (and not super well documented) conventions so that its managed “pure” Kubernetes can integrate with AWS’s IAM and autoscaling groups. And because EKS lives inside of AWS, it’s also subject to AWSs constructs, like VPCs and their endpoints.

All-in-all EKS is valiant attempt to squeeze Kubernetes into an AWS managed service. It’s got its kinks, for sure. But once it’s wired up, it mostly behaves like Kubernetes. So, let’s talk about how to wire it up.

Private or Public Kubernetes?

Unless you are just playing around, you probably want your AWS managed Kubernetes service to be private — as in no direct access from the Internet. The AWS docs aren’t super clear on how to do this. But they do mention private and public Kubernetes API endpoint access. And indeed, there is an option to disable public Kubernetes API endpoint access. However, be aware that doing this will make it impossible to access your Kubernetes API (e.g. kubectl) from outside the VPC in which your your EKS cluster was provisioned — you will have to use a configured VPN connection or a bastion host or the like. We opted for VPN, but that’s a topic for a later post.

As you might imagine, a private EKS Cluster comes with other considerations. For example, if you have S3 endpoints configured for your VPC then you will need to ensure the appropriate Docker registry endpoints are configured. A configured S3 endpoint gateway will block Internet access to anything hosted in S3, which includes Yum repositories and Docker registries, including DockerHub. Below is an S3 endpoint policy configuration that allows access to most public Docker registries.

S3 Endpoint Policy For Public Docker Images

Basic EKS Anatomy

Before we go much further, it’s appropriate to level set by having a high level conversation about EKS’s anatomy. Like many AWS services, EKS was built on top of existing foundational AWS services. It is hosted inside a VPC, it integrates with available AWS load balancers and its access to and from other services can be controlled through IAM. Additionally, EKS Node Groups that attach to the EKS Cluster are built on top of AWS’s autoscaling group service. To validate this, provision an EKS Cluster by clicking through the console and then navigate to EC2->Autoscaling Groups and there you will find an autoscaling group that was generated by EKS.

Kubernetes, of course, was not developed with the express purpose of being hosted within AWS. So, AWS had to jump through some hoops to make it a seamless (cough, cough) integration with their EKS service. Okay, okay, the seamless part is a work in progress. But that aside, there are trade-offs that were filled in with conventions. For example, it is likely that Kubernetes admins would want their Kubernetes cluster to auto-scale based on pod availability. However, AWS autoscaling groups have no concept of pod availability as a metric. So, how can this be done? Well, it’s done by using a Kubernetes metrics service called Prometheus and another Kubernetes service called Cluster Autoscaler that hooks into the AWS Autoscaling Groups service using tags. We’ll walk through this in just a minute.

Creating ECR Endpoints

Given that we want to provision an EKS Cluster inside of a private VPC, and given that we are likely using VPC endpoints, we will also need ECR (Elastic Container Registry) endpoints setup and available. To do this, pop over to the VPC configuration menu in the AWS Web Console and click Endpoints in the vertical navigation menu to the left.

After clicking Create Endpoint, select the VPC that will contain your EKS Cluster and search for ecr. Select the item that looks like com.amazonaws.{region}.ecr.api. Next, select the subnets in your VPC that will route to the endpoint (i.e. all the subnets used by your EKS Cluster) and the security group that you wish to use (at a minimum, the security group must allow ingress on port 443 for all of the selected subnets). Finally, select an existing (or create a new) policy, which can just be the default “Allow All” policy for now.

Once you have successfully created your com.amazonaws.{region}.ecr.api endpoint, we will need to create another identical endpoint for com.amazonaws.{region}.ecr.dkr.

Provisioning an EKS Cluster

Like many AWS managed services, creating a new EKS cluster using the AWS Web Console is as simple as walking through the wizard.

EKS Cluster Wizard In The AWS Web Console

You will, of course have to create an EKS Cluster Service Role in IAM. So, hop over to IAM and create a new role with the following trust relationship policy.

EKS Cluster Service Role Trust Relationship Policy

Now, add the AmazonEKSClusterPolicy and AmazonEKSVPCResourceController policies as follows.

Click Next to specify your networking configuration. Here, you will have to select a VPC and subnets for your EKS Cluster along with security groups to apply to the EKS-managed Elastic Network Interfaces that are created in your worker node subnets. You will also be asked to specify if your EKS Cluster Endpoint Access should be public or private. We’re going to assume that you already have a private VPC setup with subnets, so let’s select private.

Click Next to configure logging. And in the interest of brevity, click Next to review and create. Finally, click Create to create your new EKS Cluster. Now that your EKS Cluster is created, it’s time to add a Node Group, which as we discussed previously, is an Autoscaling Group in disguise. And for that, you will also need another role… a role that looks like this.

EC2 Trust Relationship Policy — Node Group Role Trust Relationship Policy

Amazon Policies Attached to Node Group Role

Once you have your Node Group role created, head over to your EKS Cluster, click on Configuration, select Compute and click Add Node Group.

You will now be able to select the role you just created and select an existing launch template. Here, you can also add tags, which will be attached to both your Node Group and the EC2 instances that comprise your Node Group. And, as it turns out, tags are pretty important — they are a mechanism that Kubernetes services, like Cluster Autoscaler use to interface with AWS services. For that reason, let’s add the following tags, replacing {cluster name} with the name of your EKS Cluster.

k8s.io/cluster-autoscaler/enabled = TRUE

k8s.io/cluster-autoscaler/{cluster name} = owned

After the above tags are specified, you can click Next and select the specifics for the nodes in your new Node Group. Clicking Next again allows you to select a subset of your Cluster’s subnets in which to provision the instances that make up your Node Group. Clicking Next again allows you to review and create your EKS Cluster’s Node Group.

Testing It Out

Now that you have an EKS Cluster up and running with a Node Group attached, it’s time to test it out. Since your EKS Cluster is private, however, you can’t just hook up kubectl from your workstation. You will first need to SSH into a bastion host or if you have VPN setup that routes into the VPC that contains your EKS Cluster, you can simply connect to your VPN.

Once you have connectivity into your VPC, you will want to make sure that you have the AWS CLI and kubectl available at a minimum. Then, you can run the following commands to switch your Kubernetes context to point to your EKS Cluster.

$ aws eks update-kubeconfig --region {region} \
--name {eks_cluster_name}$ kubectl config use-context {eks_cluster_arn}

You should then be able to run kubectl commands against your EKS Cluster, like this.

$ kubectl get pods

Autoscaling With EKS

As mentioned above, Cluster Autoscaler is how horizontal autoscaling is achieved with Kubernetes. It allows a Node Group to scale up or down based on its configuration and pod requests/availability. Put simply, if there are more pods than there is available capacity, the Node Group scales up and when there is ample capacity, the Node Group scales down to its desired count. But how does AWS know anything about pods and capacity — those are Kubernetes constructs? It uses 2 Kubernetes services: Prometheus and Cluster Autoscaler.

Prometheus is basically a metrics server that continuously monitors and reports on the state of Kubernetes — memory & CPU usage, available capacity, the resources pods are consuming and more. Cluster Autoscaler gets the metrics from Kubernetes and using tags (the k8s tags we added to the Node Group above), it triggers scale up or scale down events. Since Cluster Autoscaler relies on Prometheus, we must first install Prometheus.

Deploying Prometheus

Prometheus is best installed using Helm. Helm is a package manager and orchestrator for Kubernetes. Prometheus requires multiple deployments and Helm simplifies those deployments into a single command via a Helm Chart. In the official EKS documentation, it is recommended that Prometheus be installed via the prometheus-community Helm repo. But herein lies another gotcha. Some of the images referenced in the Helm Chart used from that repo are in the quay.io container image registry, which the EKS docs make no mention of, and which will not be immediately available if you have endpoints configured in your VPC. Of course, you can always add quay.io buckets to your S3 endpoint configuration. But instead of doing that, we can just use a Prometheus Helm Chart that works with the specified S3 endpoint configuration above.

Ensure Helm is installed on the host you used to test connectivity to your EKS Cluster above. Then, execute the following on that host.

$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm search repo bitnami | grep prometheus

You should see something that looks like this.

bitnami/kube-prometheus  6.1.2  0.48.1  kube-prometheus collects...

To deploy Prometheus, execute.

$ kubectl create namespace prometheus
$ helm upgrade -i prometheus bitnami/kube-prometheus \
--namespace=prometheus

You can check the status of your Prometheus deployment by executing the following command.

$ kubectl get pods --namespace=prometheus

Deploying Cluster Autoscaler

Now that Prometheus is deployed, we’re ready to deploy Cluster Autoscaler. Admittedly, the AWS EKS User Documentation for Cluster Autoscaler do a pretty good job of detailing the necessary steps, with one very significant caveat, which we will now address.

In the AWS EKS User Documentation for Cluster Autoscaler, you are referred to Create an IAM OIDC provider for your cluster. Navigating to the target of that link, notice the following command.

$ eksctl utils associate-iam-oidc-provider \ 
--cluster <cluster_name> --approve

In the above command, eksctl command line tool is used to create an IAM identity provider in AWS using the OIDC endpoint associated with the EKS Cluster name provided. Additionally, it also registers a thumbprint. No other command does all of that. Not the AWS CLI, not the Terraform iam_openid_connect_provider resource, nothing. So, any reasonable attempt to automate the creation of that OIDC identity provider will result in Cluster Autoscaler failing unless you use the above eksctl command. The word reasonable is emphasized because it is possible to register a thumbprint without eksctl, but it’s not trivial. It’s just one of those things that AWS kind of snuck in there with a special tool that you must be aware of. Do not ignore this detail or Cluster Autoscaler will not work.

Otherwise, the documentation provided by AWS on deploying Cluster Autoscaler is sufficient. So, you can refer to that to complete your deployment of Cluster Autoscaler.

Wrapping Up

If you made it this far then congrats! You have a working EKS Cluster with metrics and Autoscaling! But you probably also noticed along the way that there is a little more to EKS than your vanilla Kubernetes cluster. However, now that you’re here, you should find that EKS does function a lot like a Kubernetes cluster that is not managed by AWS.

You may also be asking yourself, “what are some other cool things that I can do with my EKS Cluster?” One really cool thing you can do is connect up Lens. Lens will give you a graphical interface for many kubectl commands. Oh, and since you have Prometheus deployed on your EKS cluster, you can also see real-time metrics visually. For more on configuring Lens with Kubernetes or to set it up with Prometheus on your local Kubernetes installation, check out Getting Started With Kubernetes on our Blog.

Thank you for reading. If you liked this post, please check out the Blu Flame Technologies Blog.