ManoMano’s journey with EKS (Elastic Kubernetes Service)

Published in

ManoMano Tech team

11 min readApr 22, 2021

Some Context

A little further than 2 years ago the company took the decision of migrating from a classic hosted provider to AWS Cloud before the end of 2019. As you can guess the short time frame forced us into a classic “Lift and Shift” migration which was successfully performed by October 2019.

After this migration we started asking ourselves “What’s next ?”:
- we should be more cloud native
- let’s use Docker, hum I mean containers

Our first step was to naturally go for ECS which seemed like the easier/faster solution at the time and that’s what we did and over the following year dozens of microservices popped on our fresh new ECS clusters.

The more we used ECS the more I ( it’s really a personal opinion here) felt the user interface was really not friendly/practical for SREs and even worse for the feature teams. We then thought about using Kubernetes, I had been using it at my previous employer and we managed to make something quite “simple” and accessible for the feature teams. Obviously the “user experience” argument was definitely not going to be enough of an argument to pitch the kubernetes project. So we talked with all my fellow SREs about Kubernetes and as it turns out many of them wanted to go there and had a lot of inputs regarding the potential benefits of kubernetes:

- It would allow us to be more “cloud agnostic” (quotes because we are using EKS ;))
- It would be easier to handle Feature Teams autonomy ( increased rights per namespace,..)
- We love Fashion IT, we want KUuuuuuuuuuuube
- We could aggregate a lot of services onto a single platform:
1. classic EC2 and ECS workload
2. use cronjobs to replace our “rundeck”
3. use openFAAS or alike to replace AWS lambdas (function as a service)

So the decision was made: let’s have “fun” with kubernetes and logic would dictate: Kubernetes + AWS = EKS

Let’s dig a little bit further into the pros and cons of EKS

Disclaimer: Everything mentioned below is based on the current knowledge we had when we designed/implemented our cluster. Some issues may have been fixed since then and before this article was published.

The Managed Control Plane

It’s definitely my favorite part and was a key decision to go for EKS: no more worrying about generating the certificates, handling the apiserver, control manager, kube scheduler and etcd. Let AWS handle it and secure it. It’s really great but obviously this will limit your range of action such as for instance being able to collect metrics from the kube controller manager.

During our journey we ran into a nice surprise: we found out that AWS regularly offered to upgrade to a new version, with a delay compared to official releases, by a simple click in the UI. Or so it seemed to newbies like us who did not read this upgrade procedure. The control plane was indeed upgraded but the kube-proxy, CoreDNS and AWS VPC CNI were not done and you had to it manually with some kubectl patch or kubectl edit commands. It would have been nice to not have these components preinstalled and be able to install it properly on our own terms because hardcore terraform users like us tend not to like when we have to run a shell command to patch something during our terraform apply.

It’s important to note that even though AWS is doing all the heavy lifting you should not forget your responsibilities.Make sure you set up some proper observability with tools such as datadog and don’t forget to backup your conf in case an overzealous SREs runs a buggy clean up script; In our case we went for a tool called velero.

PROS:

All control plane is handled by AWS
Easy upgrade process (of the control plane only)

CONS:

CoreDNS, Kube-proxy and AWS VPC Cni are not separate components on install and need manual upgrade
Lack of control, customization

The Managed Node Groups

In order to be able to run your workloads you will now need nodes and this will raise 2 important questions:

Should I use a custom AMI or the AWS AMI ?
Should I use EKS managed node groups or create my own autoscaling group ?

For the AMI part, to answer that you need to ask yourself : How much do I need to customize my AMI ? It can range from “not at all” to “I need a lot of stuff for security, observability”. If you have high requirements you definitely want to go for a custom AMI. If on the opposite you have little requirements you need to ask yourself: Is it worth the hassle to make a custom AMI ? Because let’s face it you will need to handle the certificates, docker, the kubelet and so on. On our side we considered that we had almost no requirements so we went for the AWS AMI.

Now regarding the autoscaling groups themselves, we wanted to stick to managed services as much as we could so we went with managed node groups. They are basically wrappers for autoscaling group where you can pick the name, the min size and max size, instance type and , wait for it… capacity type (Spot or On Demand). Whaaaat ? You said SPOT ?? Yes, and I’m betting your CFO will like it.

The big big bonus of managed node groups is the rolling upgrade process: basically it will do all the steps for you and in a safe way:

pop a new node
cordon old node
drain old node
delete old node

It may be a little lengthy but it’s a real nice feature. Bear in mind that it might fail in case you have pods with pod disruption budgets blocking the eviction process, unless you manually intervene to kill/move the pods.

You may ask yourselves: how is the autoscaling handled ? Enter cluster autoscaler, an external tool that you need to deploy on the cluster. It basically watches for pending pods and determines wether or not to scale based on matching nodes and their autoscaling groups. We’ve tested it and it works for managed node groups and classic autoscaling groups provided you have the proper tags and rights. The requirement is to limit groups to instance of same CPU/Memory capacity for it to work properly.Cherry on the cake, you can set priority on the groups and say for instance: I want to scale the “SPOT” group first and if you fail then scale the “On Demand” group. Upon noticing that feature , greedy as we are; we naturally wanted to be a 100% on Spot instances and have “On demand” groups on standby with a minimum of 0 instance but here’s the catch: Managed node groups do not support a minimum of 0 instance.

Now, for old school ops like us you naturally want to seggregate your workloads (system/web app/workers) on different node groups. Kubernetes allows you to do with taints and tolerations. The bad news is that you cannot set it as a property on your node group (like labels). Luckily we can use a custom launch template and override the user data in order to set the required taints on nodes, and that’s what we went for.

PROS:

easy rolling upgrade
easy/smart autoscaling with cluster autoscaler
supports spot instances
supports custom AMIs/Launch templates

CONS:

Does not support min of 0 instance
Does not support taints natively

Authentication with IAM

For us the native integration with IAM was a key “selling point” for EKS:

Mapping IAM roles to Kubernetes users/groups (official doc)
Allowing pod service accounts to assume IAM roles (official doc)

The first part make our lives easier , provided we created the proper roles/groups/users in our cluster and the same on IAM we simply need to make a mapping with them. The drawback of this is that the configuration is in a single configmap called aws-auth, and you might get in trouble if you mess it up.

apiVersion: v1
data:
  mapRoles: |+
    - rolearn: arn:aws:iam::XXXXXX:role/int-infra-eks-cluster-eu-west-3-managed-node
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes
    - rolearn: arn:aws:iam::xxxx:role/sre
      username: sre
    - rolearn: arn:aws:iam::xxxx:role/dev
      username: dev

If would have been a nice feature to be able to handle all those mappings individually with custom resource definitions.

In our first draft we went for “one feature-team = one namespace” and we mapped the “feature-team role” to one in kubernetes having write rights in the dedicated namespace, this allowed us to have proper seggregation and increased feature team autonomy. Unfortunately reality caught us: applications often move between feature teams, forcing us to uninstall/install them in another namespace. So we went back on this and finally settled on making one namespace per domain.

Regarding the second point about service accounts for pods being able to assume IAM role there is nothing much to say, it’s a neat feature pretty much like task roles for ECS. Just be careful though, if your container is using a too old version of the AWS SDK, it will fail to retrieve the credentials it has been assigned and will end up using the node role. As you can imagine assuming the node role seems like a security breach and you can find some documentation online showing you how to use iptables on the node to prevent containers from accessing the node metadata/role. We wanted to set this up on our cluster, unfortunately we are using consul and datadog that do not support using the pod role and need to access the node metadata so we pretty much had to rollback that part

PROS:

Authenticate on cluster with IAM
Recently released: possibility to add another OIDC for authentication
Grant specific IAM roles to service accounts/pods.

CONS:

Missing custom resource rather than single configmap

AWS VPC CNI

If you did not understand this, it probably means I should have put a disclaimer earlier regarding the target audience of this article ;)If you know nothing about CNIs and Overlay networks in kubernetes, this article is a good starting place.

Now that you’ve read this super simple article let’s get to it. Basically AWS developped a custom CNI allowing pods to directly have a network interface/IP address within the VPC. Thanks to this you do not need any additional overlay network and pods that are on different nodes can directly talk to each other (if Security group allows it of course ) and even to other components in the VPC ( EC2, RDS ). There are 2 drawbacks to this that you need to be aware of:

It takes up IPs in your VPC, so make sure you have a big enough CIDR Block( we do ;) ). Some people even like to have a dedicated VPC for their EKS cluster.
It will limit the number of pods running on the containers (i.e: 58 pods for a m5.xlarge) so be aware of it when sizing.

In any case you can chose to remove this CNI and use another CNI/Overlay if it does not fit your needs (ie. flanel, calico,.. ). It’s packaged by default in the cluster but not mandatory. As for us we chose to keep it for now since the current limitations were not a deal breaker.

PROS:

pod IP in VPC (no overlay required)

CONS:

Uses VPC IPs
Limits number of pods per node

Security Group for Pods (Optional)

As mentioned in the title you are in no way obligated to use this and if you are too lazy to read this section I’d strongly advise against it. In our previous container iteration (ECS), we had an “open bar” security group allowing any container to access any RDS database, Redis cluster, etc.. For this new infrastructure and since we are pending a PCI DSS certification ( a lot of fun, you should do it too ;) ), security team said it was not possible.

Hence we had to limit access to and from our pods to the database and all. And then, the magic happened, we found out that AWS had security groups for pods. To cut it short, it allows you once your security group is created to create a custom resource in kubernetes called a security group policy allowing you to assign a specific group id (not name) to pods matching a certain set of labels. But as you all know, the universe is a place of balance and every light must come with darkness. First of all we are limited to 4 security groups per pods, this can be annoying for teams who like to create a whole lot of “common security groups” to be used by those who need it. Second point and that’s the really annoying one : it takes the pod limitation per node really down (18 for xlarge instances !!!, 36 for 2xlarge.).The complete list of supported instance types and limits is here. Concretely, if you have approximately 6 daemonsets you’re limited to 12 application pods on a r5.xlarge.

This caused some serious questioning on our part but security constraints were there and we had no choice. In order to optimize it we went from r5.large to m5.2xlarge in order to at least have 38 pods for 32GB of RAM, which should be ok since we are starting to have a whole lot of new Java apps.

We did not take the time to look deeper into CNIs like Calico with some nice security features but I assumed it would not integrate natively with security groups and all. Maybe it was a big mistake on my part.

PROS:

proper network segmentation that is AWS cloud compatible

CONS:

Max of 4 SGs per pod
Big limitation on number of pods per node.

Ingress (AWS Load Balancer Controller)

One important topic to keep in mind when deploying a kubernetes cluster is the ingress: how to I get the traffic from the outside world into pods in my cluster ?

I’m setting aside all inter appliction (EC2 to EKS, ..) since I hope my colleague ( Isaac the Machina ) will be writing a nice article about what we did on that part. Spoiler alert we used consul service mesh or service mess as he likes to call it.

Now let’s get back to the whole “I want to see my app in my browser” subject. This is where the AWS Load Balancer comes in.

The first version of this controller litteraly created one ALB per application which made no sense to us: we already had that pattern on EC2,ECS and when you start having 200 microservices you can actually see the footprint of the ALB count on the bill.

The second version that came out a few weeks/months ago brought a lot of nice improvements such as grouping ingress rules on a same load balancer but it still required that the ALB/NLB was created by the controller itself. Again the old school ops guys we are weren’t really fond of it: we kind of like being able to create our LBs and DNS entries with our IAC and provide it for use. ( Don’t get me started on putting the AWS operator in kube to create the DNS and all). The middle ground solution was using the target group binding feature of this controller which allows you to attach pods matching certain labels to a target group. This requires of course for the target group , listener rule and AWS Load Balancer to be created. As you can imagine we cannot precreate all the rules for all the apps so there is some wildcard magic in place there. I’m voluntarily not putting any schematic there, sorry for that but I don’t want to spoil too much what could be a nice article.

PROS:

Native integration with AWS

CONS:

Cannot provide a precreated Load Balancer for use

Conclusion

This article may seem long but I wanted people reading it to be able to ask themselves the proper questions before diving into Kubernetes on AWS and more specifically EKS on AWS. If you want a simple answer, from our small scale perspective it’s a YES. However in designing your cluster , node groups, security and all take a step back and think about your needs/constraints and the current limitations before picking a given solution. And also remember that EKS is still young and work in progress and AWS teams are continuously improving it.ManoMano’s journey with EKS (Elastic Kubernetes Service)

ManoMano’s journey with EKS (Elastic Kubernetes Service)

Some Context

The Managed Control Plane

The Managed Node Groups

Authentication with IAM

AWS VPC CNI

Security Group for Pods (Optional)

Ingress (AWS Load Balancer Controller)

Conclusion

Written by Shailendra NARAYEN