Private Kubernetes cluster on AWS using Elastic Kubernetes Service and its challenges

Published in

Beck et al.

4 min readSep 28, 2020

Recently, as part of a client project, we had the task of creating a private Kubernetes cluster on AWS. Creating a public cluster on Elastic Kubernetes Service(EKS) is relatively straightforward — the primary reason being the connectivity to the internet. Now you would be thinking whether the absence of the internet could be such a bummer while creating a Kubernetes cluster on EKS. And the answer is a big “YES”. The simple explanation is that the internet supports communication to all AWS services including registration of node with the cluster without the need for VPC endpoints.

Now let us have a look at the typical EKS architecture, which is shown below:

Source: Amazon (https://aws.amazon.com/blogs/containers/de-mystifying-cluster-networking-for-amazon-eks-worker-nodes/)

As you can see, there are two VPCs: one Amazon VPC hosting the Kubernetes control plane and other hosting the cluster worker nodes managed by us (i.e., customer). By design, we don’t have access of any kind to the control plane. Worker nodes need to connect either with a public endpoint or EKS managed Elastic Network Interface(ENI). Based on this particular route, the worker node makes our cluster public or private. Now let’s go deeper into the networking part of the clusters to understand different types of modes that are possible when creating a cluster.

Networking Modes

Public endpoint only: In this case, nodes should have a public IP address to connect to the control plane. There should also be a route to an internet gateway or a NAT gateway where they can use the public IP address of the NAT gateway. This is the default behaviour of the EKS.
Public and private endpoint: In this mode, Kubernetes API requests from within the worker node VPC to the control plane go through the EKS-managed ENIs within the worked node VPC.
Private endpoint only: Public access to the API server from the internet is closed. Any kubectl commands will work only if they originate from within the VPC or a connected network such as AWS VPN or AWS DirectConnect to your VPC.

Now let’s see how our setup of the AWS infrastructure for the EKS looks like.

EKS Cluster architectural diagram of our client project

We can see that there is no internet gateway attached to the subnets on the AWS side and hence there is no egress to the internet. On the client-side, customers connect to the servers on the client data center side and access the cluster and the pods through the connected networking via VPN GW.

Now let’s see how to create such clusters with the following step by step process:

Creating a private VPC with its associated private subnets. Creating the load balancers in such a network requires the service definition of the corresponding service to have annotation as mentioned in this AWS documentation, i.e.,
service.beta.kubernetes.io/aws-load-balancer-internal: “true”
Creating VPC endpoints for different services such as EC2, S3(for pulling container image), ECR, Cloudwatch logs, STS(for IAM roles for service accounts), Elastic LoadBalancing, AutoScaling. You can find the documentation about endpoints here. After you have created this endpoint, you need to add the security of node groups or entire VPC CIDR/Subnet CIDR block to the security group of VPC endpoints to allow access over 443. This is very essential as the node group will not join the cluster if this is not allowed.
Create the cluster manually or using Cloudformation/Terraform. For self-managed nodes, you need to include the following text as bootstrap arguments as it bypasses EKS introspection and doesn’t require EKS API access from within the VPC.
— apiserver-endpoint <cluster-endpoint> — b64-cluster-ca <cluster-certificate-authority>
We can customize the cluster with different services such as cluster-autoscaler and ingress controller. But for all such customizations, we need the relevant image of the services either in AWS ECR or private repo as Gitlab connected via a VPN to our VPC network on AWS. After pulling the images, you can follow this AWS documentation for uploading to AWS ECR.

Restrictions with private clusters:

eksctl is not supported.
X-Ray is not supported.
ALB ingress controller is not supported and hence we need to create a network load balancer if you need to work with the Nginx ingress controller.
Self-managed and managed nodes are supported. The instances for nodes must have access to the VPC endpoints (as mentioned earlier in Step 2 of the process)

Learnings:

Deploying any new features or packages to the cluster needs additional steps (such as having a container image in AWS ECR and then running setup based on the ECR image).
VPC endpoints are essential to the setup without which the worker nodes will never join the cluster.
Default DHCP configuration for the VPC works with the cluster and modifying CoreDNS on the cluster could aid in forwarding domains to be resolved by particular DNS servers.
Maintaining and upgrading clusters require effort if the nodes are not managed by Amazon, as we have to manually upgrade the image required to work with the relevant Kubernetes version.

You can follow this AWS documentation on private EKS clusters for your reference. In this article, we have tried to summarize the learnings and challenges that we faced while creating a private cluster on AWS EKS. Feel free to share your comments and suggestions so that we can all learn from the discussion.

Private Kubernetes cluster on AWS using Elastic Kubernetes Service and its challenges

Written by Vivek Sethia