Kubernetes: Launching a full EKS cluster in 13 steps, more or less

Published in

The Startup

8 min readOct 3, 2019

When you first deploy Amazon’s Kubernetes service, you get a running cluster with no deployed worker nodes and few pre-configured pods. There are a few separate instructions for how to configure common add-ons, but it takes a bit of experience and time to put all the pieces together. Here is my setup process.

1. Deploy EKS itself via Terraform

First, you’ll need an EKS cluster to be created and my preferred tool for this is Terraform.

resource "aws_iam_role" "eks-cluster-control-plane" {
  name = "eks-cluster-control-plane"
  assume_role_policy = "${data.aws_iam_policy_document.eks-cluster-control-plane-assume-role.json}"  tags {
    Name        = "eks-cluster-control-plane"
  }
}resource "aws_eks_cluster" "cluster" {
  name     = "my-eks-cluster"
  role_arn = "${aws_iam_role.eks-cluster-control-plane.arn}"
  version  = "1.13"
  
  enabled_cluster_log_types = [ 
    "api",
    "audit",
    "authenticator",
    "controllerManager",
    "scheduler"
  ]
  
  vpc_config {
    security_group_ids = [
      "${aws_security_group.eks-cluster-control-plane.id}"
    ]
    subnet_ids         = [
      "${module.integration.eks_private_subnets}"
    ]
    
    endpoint_private_access = true
    endpoint_public_access  = false
  }  provider = "aws.eks-creation"  count = "${signum(var.eks-bootstrap["creator-role-exists"])}"  depends_on = ["aws_cloudwatch_log_group.eks"]
}resource "aws_cloudwatch_log_group" "eks" {
  name              = "/aws/eks/${var.environment}-eks/cluster"
  retention_in_days = 30
}

This creates a cluster, a control plane role and a CloudWatch log group. Creating the log group in Terraform allows you to configure retention and makes it easier to wire up a few other things later, not least IAM policies.

You may notice that I use a custom provider. This is because when you create a cluster, the Kubernetes cluster-admin role (see Using RBAC Authorization) is associated with the creating user or role. Assigning this to a dedicated IAM role makes things easier later, but requires some extra setup and a bootstrap step (create the role first and then subsequently create the cluster):

resource "aws_iam_role" "eks-cluster-owner" {
  name = "eks-cluster-owner"
  assume_role_policy = "${data.aws_iam_policy_document.eks-cluster-owner-assume-role.json}"  tags {
    Name        = "eks-cluster-owner"
  }
}variable "eks-bootstrap" {
  description = "Properties needed to control progressive creation of EKS cluster"
  type        = "map"
  default     = {
    creator-role-exists = 0 # Change this to 1 after the role exists
  }
}locals {
  provider-aws-eks-cluster-owner-role-list      = "${signum(var.eks-bootstrap["creator-role-exists"]) == 1 ? join(",",aws_iam_role.eks-cluster-owner.*.arn) : ""}"
  provider-aws-eks-cluster-owner-effective-role = "${element(split(",",local.provider-aws-eks-cluster-owner-role-list),0)}"
}provider "aws" {
  region     = "${var.region}"
  access_key = "${var.access_key}"
  secret_key = "${var.secret_key}"  version = "~> 2.0"  assume_role {
    role_arn = "${local.provider-aws-eks-cluster-owner-effective-role}"
  }  alias = "eks-creation"
}

You can see here that there is some extra Terraform magic to ensure that a valid provider is created whether the role exists or not, but it only has the assume_role set when creator-role-exists is set.

You also need policies for these roles. Here are my policies:

data "aws_iam_policy_document" "eks-cluster-control-plane-assume-role" {
  statement {
    sid = "AllowAssumeRole"
    effect = "Allow"
    actions = [
      "sts:AssumeRole",
    ]
    principals {
      type = "Service"
      identifiers = [ "eks.amazonaws.com" ]
    }
  }
}resource "aws_iam_role_policy_attachment" "eks-cluster-control-plane-aws-service-policy" {
  role       = "${aws_iam_role.eks-cluster-control-plane.name}"
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSServicePolicy"
}resource "aws_iam_role_policy_attachment" "eks-cluster-control-plane-aws-cluster-policy" {
  role       = "${aws_iam_role.eks-cluster-control-plane.name}"
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
}data "aws_iam_policy_document" "eks-cluster-owner-assume-role" {
  statement {
    sid = "AllowAssumeRole"
    effect = "Allow"
    actions = [
      "sts:AssumeRole",
    ]
    principals {
      type = "AWS"
      identifiers = [    
"arn:aws:iam::${data.aws_caller_identity.this.account_id}:root",
      ]
    }
  }
}resource "aws_iam_role_policy_attachment" "eks-administration" {
  role       = "${aws_iam_role.eks-cluster-owner.name}"
  policy_arn = "${aws_iam_policy.eks-administration.arn}"
}resource "aws_iam_policy" "eks-administration" {
  name   = "eks-administration"
  path   = "/"
  policy = "${data.aws_iam_policy_document.eks-administration.json}"
}data "aws_iam_policy_document" "eks-administration" {
  statement {
    sid = "AllowEKSManagement"
    effect = "Allow"
    actions = [
      "eks:*",
    ]
    resources = [
"arn:aws:eks:${data.aws_region.this.name}:${data.aws_caller_identity.this.account_id}:cluster/${var.environment}-eks",
"arn:aws:eks:${data.aws_region.this.name}:${data.aws_caller_identity.this.account_id}:cluster/${var.environment}-eks/*",
    ]
  }statement {
    sid = "AllowPassControlPlaneRole"
    effect = "Allow"
    actions = [ "iam:PassRole" ]
    resources = [
      "${aws_iam_role.eks-cluster-control-plane.arn}",
    ]
  }
}

You may also see some references to aws_region and aws_caller_identity here. These are simple helpers:

data "aws_region" "this" { }
data "aws_caller_identity" "this" { }

You will of course also need security groups. I’ve not covered those here as it would take a long time, be boring to read and is fairly well covered in Cluster Security Group Considerations. One thing that can be overlooked is that using AWS ENI networking directly (as I strongly recommend), means that pods need to be able to communicate directly which means that they also need security group access rules. Generally, this means that the AWS VPC Security Groups will allow all traffic between API servers and nodes and between different nodes. This can be hardened by a network policy layer or by using dedicated subnets for pods. You’ll hit issues if you rely on VPC Security Groups in the primary network subnets for this.

2. Validate the cluster components

When you deploy an EKS cluster, AWS will pre-assign CoreDNS, kube-proxy and aws-node (the CNI agent). However, you should check the versions are what you want. Remember you’ll need to keep these up to date. See Updating an Amazon EKS Cluster Kubernetes Version. It’s a good idea to make sure you fully understand that page and check it for updates regularly.

3. Update CoreDNS for nested ExternalName resolution

See https://github.com/coredns/coredns/issues/2038. By default, CoreDNS won’t allow an ExternalName service to be resolved recursively. This can be a useful capability if you want to connect services in different namespaces. The fix is trivial, just add upstream /etc/resolv.conf to the CoreDNS ConfigMap:

kubectl edit configmap --namespace kube-system coredns

You might also want to scale the deployment. To ensure it gets the new ConfigMap (which can take up to 10 seconds to become available), I use the following:

kubectl scale --namespace kube-system deployment coredns \
  --replicas=0
# WAIT
kubectl scale --namespace kube-system deployment coredns \
  --replicas=2

4. Patch the CNI configuration

You should do this, ideally, before you launch any worker nodes. The CNI DaemonSet uses environment variables to control its behaviour. You can read about it here: CNI Configuration Variables.

I prefer to add a ConfigMap with my changes, with only minimal changes to the DaemonSet. Here’s my ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: kube-system
  name: my-aws-cni-environment
data:
  AWS_VPC_CNI_NODE_PORT_SUPPORT: "true"
  AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG: "false"
  AWS_VPC_K8S_CNI_EXTERNALSNAT: "true"
  WARM_IP_TARGET: "10"

I then patch the DaemonSet as follows:

#!/bin/bashPATCH_SPEC="$(cat - <<EOF
{
  "spec": {
    "template": {
      "spec": {
        "containers": [
          {
            "name": "aws-node",
            "envFrom": [
              {
                "configMapRef": {
                  "name": "my-aws-cni-environment",
                  "optional": false
                }
              }
            ]
          }
        ]
      }
    }
  }
}
EOF
)"kubectl patch --namespace kube-system daemonset/aws-node \
  -p "${PATCH_SPEC}"

5. Apply AWS IAM node authorisation

For worker nodes to authenticate with the Kubernetes RBAC system, you need to ensure that the aws-auth ConfigMap has been configured. Here’s mine:

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: @@INSTANCEROLE@@
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes

I use a script to inject my @@INSTANCEROLE@@ but you can add it directly if you prefer.

6. Add storage classes

I prefer to have dedicated storage classes available for each availability zone. You may also want additional storage classes for different workloads. Here’s one example:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: gp2-us-east-1a
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
  zones: us-east-1a

7. Deploy the metrics server

Kubernetes uses a metric-server component to support the HorizontalPodAutoscaler. It’s a useful component but not a full-featured metrics agent. Personally, I don’t consider it optional. Deployment is fully described here: Installing the Kubernetes Metrics Server.

8. Deploy your ingress controller

I blogged about this in a previous post here.

9. Deploy the EFS provisioner

If you want to make use of EFS for file storage (which does have the advantage of not being tied to a single availability zone), you may want to deploy the efs-provisioner. This will need some adaptation, it’s best to review the whole deployment and customise it for your needs.

One step that can be missed is that you may need to create the root mount folder, if you don’t use the EFS root, in advance. Execute something like the following on a node in your AWS subnet that has access to the EFS filesystem:

sudo -i
mount -t nfs4 <EFS-DNS>:/ /mnt
mkdir /mnt/my-efs-root-folder
umount /mnt

10. Apply log shipping capabilities

I used a combination of fluentd and custom capabilities to ensure that pod and SYSTEMD unit logs were shipped to our logging platform. What you need will be dependent on your environment.

11. Add the NVidia GPU support add-on if you want GPU processing support

This allows you to manage GPU resources and make them available to pods. You need to deploy a DaemonSet: NVIDIA device plugin for Kubernetes.

12. Add support for your deployment agent

Depending on how you manage deployments, in particular whether you use user-level access or an agent account, you may need to create a service account, cluster roles and bindings to allow your external deployment system to gain access to the cluster.

13. Add support for Prometheus metrics

Again, depending on what you require you may need to do different steps here. For my use, there were three deployment activities:

Prometheus Node Exporter. However, the documentation states:

It’s not recommended to deploy it as a Docker container because it requires access to the host system

So you may want to deploy this directly on your worker nodes.

2. kube-state-metrics. Deployment is described here but you may want to customise how Prometheus can access it, especially if Prometheus is external to the EKS cluster

3. A Kubernetes service account for Prometheus to use to scrape the cluster. I assign a single ClusterRole for this:

apiVersion: v1
kind: Namespace
metadata:
  name: my-system-prometheus
---
apiVersion: v1
kind: ServiceAccount
metadata:
  namespace: my-system-prometheus
  name: kubelet-scraping
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: my-system-prometheus:kubelet-scraping
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  - nodes/proxy
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - list
  - get
  - watch
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: my-system-prometheus:kubelet-scraping
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: my-system-prometheus:kubelet-scraping
subjects:
- kind: ServiceAccount
  name: kubelet-scraping
  namespace: my-system-prometheus

And there you have it, my thirteen launch steps for a new EKS cluster. I haven’t covered how you set up your own EKS worker nodes, because that will depend heavily on what you want to accomplish. Perhaps another time.

I hope this can save you some time, and at least point you at a few things to consider.

This post is also published on my blog at https://stevehorsfield.wordpress.com/2019/10/03/kubernetes:-launching-a-full-eks-cluster-in-13-steps-more-or-less/.

Update. You can now see example Terraform code for some of this here: https://github.com/stevehorsfield/eks-example.