An Adventure into Kubernetes with Amazon EKS
At the beginning of June 2018, Amazon made their Elastic Container Service for Kubernetes (EKS) generally available. Also at the beginning of June, we at Cortico, always eager to put on our learning hats and brave the scars of living on the cutting edge, stood up our first Kubernetes cluster on EKS. We were already doing a lot of stuff with AWS, such as using EC2 spot instances to transcribe talk radio, Athena to sift through Twitter data stored in S3, and ECS to host our Docker containers, so it made sense to try to have AWS provide us with the legendary resiliency of Kubernetes to orchestrate our containers. This post will go through some of our adventures standing up our cluster and highlight some things we have been able to do with it so far.
Quick Glossary of Terms
Here’s a quick list of terms/abbreviations that will be used in this post:
Amazon Web Services related:
- AWS: Amazon Web Services
- CloudFormation: Templating language to create cloud infrastructure items such as nodes, IAM roles, and more.
- EC2: Elastic Compute Cloud. Where our nodes run.
- EKS: Elastic Container Service for Kubernetes. New fun stuff!
- IAM: Identity and Access Management. Can specify roles, like
example-rolecan read and write from S3.
- S3: Simple Storage Service. Storing files in the cloud.
- VPC: Virtual Private Cloud
kubectl: the command line tool for interacting with the Kubernetes API
- RBAC: Role Based Access Control. Similar to IAM, but controls access to Kubernetes resources (e.g. creating pods) rather than to AWS resources (e.g. writing to S3).
- Pod: the standard unit of Kubernetes deployments — can contain one or more containers. Also a group of whales.
We began by following Amazon’s EKS getting started guide. This walked through how to use CloudFormation to stand up the cluster on the AWS side, as well as steps for downloading
kubectl, the command line tool for Kubernetes, and setting up
kubectl to be able to talk to the cluster. Because Cortico already had a deployment in an existing VPC configuration, we decided first to set up the EKS cluster to live within our production VPC. Doing this by hand was error prone though (weeks of back and forth with AWS support!) so in the end we gave up and spun up a completely clean setup using the tutorial CloudFormation templates. Once this was done, we were able to see our nodes as EC2 instances in the AWS console, as well as see them by running
kubectl get nodes.
There were two IAM related things we knew we wanted right off the bat:
- Ability for our team members to use their IAM roles to do stuff in the cluster. By default when deploying the cluster, only the IAM role of the person who deployed the cluster has authorization in the cluster.
- Ability for our deployed applications to use specific IAM roles to do their application based tasks. For instance, if one application needs to be able to write to S3, give it an IAM role that can do that, but maybe don’t give it access to Athena if it doesn’t need it.
The EKS documentation wasn’t super clear on how to do either of these, so we began to experiment.
IAM Roles for Team Members to use a Kubernetes Cluster
There is an
aws-auth-cm.yaml manifest file that the getting started tutorial tells you how to make. It turns out if things are specified properly in this file, EKS will be able to map your IAM roles to Kubernetes groups which have permissions. In our first attempt, we made a
cortico:devs RBAC group and assigned our dev team’s IAM profiles to it.
Snippet from our
— userarn: arn:aws:iam::xxx:user/allison
— userarn: arn:aws:iam::xxx:user/peter
Our RBAC role for
— apiGroups: [“”] # the core API group
resources: [“services”, “endpoints”, “pods”, “deployments”, “ingress”]
verbs: [“get”, “list”, “watch”]
Snippet from our RBAC rolebinding, which binds the group
cortico:devs to the role
cortico:dev, specified above:
- kind: Group
In this configuration, I’m assigned to the existing system:masters group so have all permissions, while Peter is assigned to the
cortico:devs RBAC group which has a separate set of configurations specified in an RBAC role via Kubernetes.
While this seemed great for a while, it quickly showed one limitation. Kubernetes has a useful command line operation
kubectl auth can-i get pods --as [user], particularly useful for testing out RBAC permissions. So if while configuring RBAC configurations I wanted to test if Peter had the ability to ‘get pods’, I could do:
However, this didn’t seem to work right if I assigned Peter to the group —
kubectl auth would say Peter could not do things that he in fact, could do. It seemed like there was some sort of miscommunication between the AWS configmap group and the Kubernetes API. This made testing RBAC a pain for both me and Peter, since if I wanted to try something I would have to bug him every time to see if it worked instead of just being able to tell from my command line. While not the worst thing, if
kubectl auth wasn’t working properly, we were worried that other user permission related issues might not be working quite right either.
In our second approach, instead of having RBAC groups assigned directly from the
aws-auth-cm.yaml file, we decided to only define the roles here, and leave it up to our RBAC file to group the IAM roles. In this example, our RBAC role
cortico:dev stays the same, but both
aws-auth-cm.yaml and our rolebinding
aws-auth-cm.yaml no longer assigns a group to Peter. It only maps Peter’s IAM user to an RBAC username
- userarn: arn:aws:iam::xxx:user/allison
- userarn: arn:aws:iam::xxx:user/peter
And our RBAC rolebinding manifest takes up responsibility for assigning Peter to the group:
- kind: User
And done! It’s a bit wordier, but Peter can now do everything we have specified the
cortico:dev role as being able to do, such as getting pods and creating deployments, but he can’t do things like delete entire nodes. Even better, I can see what he is and isn’t able to do from the comforts of my own chair 👀
IAM Roles for Applications Running in Kubernetes
The second IAM related task we wanted to be able to do was assign an IAM role for a running pod to assume. I thought EKS might have some built in support for this, but it turns out the EKS team is planning on contributing to the open source kube2iam project to handle that. With kube2iam running, you can specify an annotation in your manifest file. For example:
. . .
And then your pod will assume the role
example-role when it runs your application. Super easy to use once it is set up, but setting it up proved a bit difficult. kube2iam requires knowing the network layer that your cluster uses. Since our cluster was created rather magically via CloudFormation and there weren’t any clues in the template file as to the network it used, this took some digging to figure out the right network to specify. The right network type is
eni+ — this is now noted in the kube2iam documentation, but was not around while we were setting this up. This cutting edge stuff kind of hurts!
Now that these IAM related tasks were set up, we could focus on actual development.
Our development workflow looks like this:
- Develop locally in a docker container
- When ready to deploy, build the docker container
- Push the docker container to ECR
- Copy the SHA of the pushed container into a Kubernetes deployment manifest
- Assign an IAM role to the pod with the right permissions
- Apply the deployment with
kubectl apply -f deployment.yaml
- Sit back and watch Kubernetes roll out your deployment!
Overall, the parts of setting up a Kubernetes cluster that were only Kubernetes related were relatively painless — the Kubernetes documentation is very good. However, the parts that required integration with AWS specific things, such as IAM roles, were more painful, but ultimately doable. With more and more documentation and examples available everyday, many of our EKS pain points have probably gone away by now.
We’ve been able to get quite a few things going on our Kubernetes EKS cluster over the past few months. Here are some highlights:
- Using Kubernetes namespaces for separate dev and prod environments. We deploy all of our dev applications to the
defaultnamespace and our production applications to a
- Establishing a VPC peering connection so our EKS cluster’s VPC can talk to our production VPC
- Routing a public Network Load Balancer (NLB) to our internal nginx routing setup by using AWS’s built-in Elastic Load Balancing Kubernetes annotation. We can specify an annotation in our deployment file in the same way we did with our IAM roles:
This annotation makes AWS create a public NLB, which we then point our CNAME for our public facing website to. Our nginx server is then configured to point traffic for different paths to different Kubernetes services using IP routing.
- Setting up a StorageClass in Kubernetes so that pods can directly request persistent storage in AWS Elastic Block Storage
- Deploying ZooKeeper to coordinate our distributed systems
- Deploying Prometheus for monitoring using the Prometheus Operator. The Operator is an abstraction for easier set up but is still being very actively worked on. We found it pretty nice to use for the most part but we did have to submit some issues and contribute a (very tiny) pull request.
- Alerting with Prometheus’ AlertManager, with alerts sent to Slack
- Configuring the Kubernetes cluster autoscaler to coordinate with AWS Auto Scaling Groups to request more nodes when the cluster is at capacity
And here are some places we’d like to take our cluster in the near future:
- Now that we have cluster autoscaling, we have hooked it up to be able to scale AWS spot instances. The goal here is if we have a resource intensive batch job, we can specify it to run once a day. The job will request its resources and the cluster autoscaler will detect it doesn’t have enough resources, so request spot instances from AWS. The job will then run, and once done, the autoscaler will detect the spot instances are no longer needed, and relinquish them.
- Using Spark’s new native Kubernetes support to deploy Spark jobs.
In under three months, we were able to get a lot set up in the Kubernetes world and have been actively moving our applications over. We now have the joys of version controlled deployment files, monitored applications, separate dev and prod environments, a solid workflow, and the ability to spin up our entire cluster again if something goes really wrong. Though some EKS specific issues do pop up at times, for the most part once we got through the initial set up, the Kubernetes world welcomed us into its resilient arms and we haven’t looked back.
If you’re interested in also partaking in this joy, good news — we’re hiring and would love to hear from you!