An Adventure into Kubernetes with Amazon EKS

At the beginning of June 2018, Amazon made their Elastic Container Service for Kubernetes (EKS) generally available. Also at the beginning of June, we at Cortico, always eager to put on our learning hats and brave the scars of living on the cutting edge, stood up our first Kubernetes cluster on EKS. We were already doing a lot of stuff with AWS, such as using EC2 spot instances to transcribe talk radio, Athena to sift through Twitter data stored in S3, and ECS to host our Docker containers, so it made sense to try to have AWS provide us with the legendary resiliency of Kubernetes to orchestrate our containers. This post will go through some of our adventures standing up our cluster and highlight some things we have been able to do with it so far.

Quick Glossary of Terms

Here’s a quick list of terms/abbreviations that will be used in this post:

Amazon Web Services related:

  • AWS: Amazon Web Services
  • CloudFormation: Templating language to create cloud infrastructure items such as nodes, IAM roles, and more.
  • EC2: Elastic Compute Cloud. Where our nodes run.
  • EKS: Elastic Container Service for Kubernetes. New fun stuff!
  • IAM: Identity and Access Management. Can specify roles, like example-role can read and write from S3.
  • S3: Simple Storage Service. Storing files in the cloud.
  • VPC: Virtual Private Cloud

Kubernetes related:

  • kubectl: the command line tool for interacting with the Kubernetes API
  • RBAC: Role Based Access Control. Similar to IAM, but controls access to Kubernetes resources (e.g. creating pods) rather than to AWS resources (e.g. writing to S3).
  • Pod: the standard unit of Kubernetes deployments — can contain one or more containers. Also a group of whales.

First Steps

We began by following Amazon’s EKS getting started guide. This walked through how to use CloudFormation to stand up the cluster on the AWS side, as well as steps for downloading kubectl, the command line tool for Kubernetes, and setting up kubectl to be able to talk to the cluster. Because Cortico already had a deployment in an existing VPC configuration, we decided first to set up the EKS cluster to live within our production VPC. Doing this by hand was error prone though (weeks of back and forth with AWS support!) so in the end we gave up and spun up a completely clean setup using the tutorial CloudFormation templates. Once this was done, we were able to see our nodes as EC2 instances in the AWS console, as well as see them by running kubectl get nodes.

There were two IAM related things we knew we wanted right off the bat:

  • Ability for our team members to use their IAM roles to do stuff in the cluster. By default when deploying the cluster, only the IAM role of the person who deployed the cluster has authorization in the cluster.
  • Ability for our deployed applications to use specific IAM roles to do their application based tasks. For instance, if one application needs to be able to write to S3, give it an IAM role that can do that, but maybe don’t give it access to Athena if it doesn’t need it.

The EKS documentation wasn’t super clear on how to do either of these, so we began to experiment.

IAM Roles for Team Members to use a Kubernetes Cluster

There is an aws-auth-cm.yaml manifest file that the getting started tutorial tells you how to make. It turns out if things are specified properly in this file, EKS will be able to map your IAM roles to Kubernetes groups which have permissions. In our first attempt, we made a cortico:devs RBAC group and assigned our dev team’s IAM profiles to it.

Snippet from our aws-auth-cm.yaml:

mapUsers: |
— userarn: arn:aws:iam::xxx:user/allison
username: allison
groups:
— system:masters
— userarn: arn:aws:iam::xxx:user/peter
username: peter
groups:
— cortico:devs

Our RBAC role for cortico:devs:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: cortico:dev
namespace: default
rules:
— apiGroups: [“”] # the core API group
resources: [“services”, “endpoints”, “pods”, “deployments”, “ingress”]
verbs: [“get”, “list”, “watch”]

Snippet from our RBAC rolebinding, which binds the group cortico:devs to the role cortico:dev, specified above:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: cortico:dev
namespace: default
subjects:
- kind: Group
name: cortico:devs
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: cortico:dev
apiGroup: rbac.authorization.k8s.io

In this configuration, I’m assigned to the existing system:masters group so have all permissions, while Peter is assigned to the cortico:devs RBAC group which has a separate set of configurations specified in an RBAC role via Kubernetes.

While this seemed great for a while, it quickly showed one limitation. Kubernetes has a useful command line operation kubectl auth can-i get pods --as [user], particularly useful for testing out RBAC permissions. So if while configuring RBAC configurations I wanted to test if Peter had the ability to ‘get pods’, I could do:

Example of a properly working `kubectl auth` command

However, this didn’t seem to work right if I assigned Peter to the group — kubectl auth would say Peter could not do things that he in fact, could do. It seemed like there was some sort of miscommunication between the AWS configmap group and the Kubernetes API. This made testing RBAC a pain for both me and Peter, since if I wanted to try something I would have to bug him every time to see if it worked instead of just being able to tell from my command line. While not the worst thing, if kubectl auth wasn’t working properly, we were worried that other user permission related issues might not be working quite right either.

In our second approach, instead of having RBAC groups assigned directly from the aws-auth-cm.yaml file, we decided to only define the roles here, and leave it up to our RBAC file to group the IAM roles. In this example, our RBAC role cortico:dev stays the same, but bothaws-auth-cm.yaml and our rolebinding cortico:devs change.

Our aws-auth-cm.yaml no longer assigns a group to Peter. It only maps Peter’s IAM user to an RBAC username peter:

mapUsers: |
- userarn: arn:aws:iam::xxx:user/allison
username: allison
groups:
- system:masters
- userarn: arn:aws:iam::xxx:user/peter
username: peter

And our RBAC rolebinding manifest takes up responsibility for assigning Peter to the group:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: cortico:dev
namespace: default
subjects:
- kind: User
name: peter
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: cortico:dev
apiGroup: rbac.authorization.k8s.io

And done! It’s a bit wordier, but Peter can now do everything we have specified the cortico:dev role as being able to do, such as getting pods and creating deployments, but he can’t do things like delete entire nodes. Even better, I can see what he is and isn’t able to do from the comforts of my own chair 👀

Keeping Peter’s power in check!

IAM Roles for Applications Running in Kubernetes

The second IAM related task we wanted to be able to do was assign an IAM role for a running pod to assume. I thought EKS might have some built in support for this, but it turns out the EKS team is planning on contributing to the open source kube2iam project to handle that. With kube2iam running, you can specify an annotation in your manifest file. For example:

apiVersion: apps/v1
kind: Deployment
metadata:
name: example
labels:
app: example
spec:
replicas: 1
template:
annotations:
iam.amazonaws.com/role: example-role
. . .

And then your pod will assume the role example-role when it runs your application. Super easy to use once it is set up, but setting it up proved a bit difficult. kube2iam requires knowing the network layer that your cluster uses. Since our cluster was created rather magically via CloudFormation and there weren’t any clues in the template file as to the network it used, this took some digging to figure out the right network to specify. The right network type is eni+ — this is now noted in the kube2iam documentation, but was not around while we were setting this up. This cutting edge stuff kind of hurts!

Now that these IAM related tasks were set up, we could focus on actual development.

Development Workflow

Our development workflow looks like this:

  1. Develop locally in a docker container
  2. When ready to deploy, build the docker container
  3. Push the docker container to ECR
  4. Copy the SHA of the pushed container into a Kubernetes deployment manifest
  5. Assign an IAM role to the pod with the right permissions
  6. Apply the deployment with kubectl apply -f deployment.yaml
  7. Sit back and watch Kubernetes roll out your deployment!

Overall, the parts of setting up a Kubernetes cluster that were only Kubernetes related were relatively painless — the Kubernetes documentation is very good. However, the parts that required integration with AWS specific things, such as IAM roles, were more painful, but ultimately doable. With more and more documentation and examples available everyday, many of our EKS pain points have probably gone away by now.

We’ve been able to get quite a few things going on our Kubernetes EKS cluster over the past few months. Here are some highlights:

  • Using Kubernetes namespaces for separate dev and prod environments. We deploy all of our dev applications to the default namespace and our production applications to a prod namespace.
  • Establishing a VPC peering connection so our EKS cluster’s VPC can talk to our production VPC
  • Routing a public Network Load Balancer (NLB) to our internal nginx routing setup by using AWS’s built-in Elastic Load Balancing Kubernetes annotation. We can specify an annotation in our deployment file in the same way we did with our IAM roles:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"

This annotation makes AWS create a public NLB, which we then point our CNAME for our public facing website to. Our nginx server is then configured to point traffic for different paths to different Kubernetes services using IP routing.

  • Setting up a StorageClass in Kubernetes so that pods can directly request persistent storage in AWS Elastic Block Storage
  • Deploying ZooKeeper to coordinate our distributed systems
  • Deploying Prometheus for monitoring using the Prometheus Operator. The Operator is an abstraction for easier set up but is still being very actively worked on. We found it pretty nice to use for the most part but we did have to submit some issues and contribute a (very tiny) pull request.
Twitter ingest monitored by Prometheus
  • Alerting with Prometheus’ AlertManager, with alerts sent to Slack
Not a great day

And here are some places we’d like to take our cluster in the near future:

  • Now that we have cluster autoscaling, we have hooked it up to be able to scale AWS spot instances. The goal here is if we have a resource intensive batch job, we can specify it to run once a day. The job will request its resources and the cluster autoscaler will detect it doesn’t have enough resources, so request spot instances from AWS. The job will then run, and once done, the autoscaler will detect the spot instances are no longer needed, and relinquish them.
  • Using Spark’s new native Kubernetes support to deploy Spark jobs.

In under three months, we were able to get a lot set up in the Kubernetes world and have been actively moving our applications over. We now have the joys of version controlled deployment files, monitored applications, separate dev and prod environments, a solid workflow, and the ability to spin up our entire cluster again if something goes really wrong. Though some EKS specific issues do pop up at times, for the most part once we got through the initial set up, the Kubernetes world welcomed us into its resilient arms and we haven’t looked back.

If you’re interested in also partaking in this joy, good news — we’re hiring and would love to hear from you!

At Cortico, our pets are also avid data engineers!

Happy Kubernetes-ing!