Bolstering Security & Automating Management of Target Australia’s EKS clusters

Gazal Gafoor
12 min readJul 13, 2023

--

To keep pace with Australia’s growing eCommerce market, Target needed to enhance its online offering by developing an engaging, reliable, and cost-effective eCommerce platform for consumers who expect personalised and seamless online shopping experiences. Using AWS, we have modernised the online platform at Target. (check the story of our cloud transformation journey here)

As of around 2017, Target’s online platform was composed primarily of the core eCommerce system, a search solution and a couple of middleware systems that helped integrate with other enterprise software systems like the merchandising system, supply chain systems, finance systems etc. At about that time we started “strangling” out capabilities into smaller software systems; microservices. We used a serverless pattern leveraging AWS Lambda, API Gateway etc to create these new microservices. And then in 2020, we modernised our long running workloads that don’t fit the Function as a Service mould. For those applications, which are long running, we decided to containerise them and use AWS Elastic Kubernetes Service (EKS) as the container orchestration platform. Since running all of the software systems that constitute the online commerce experience on AWS, we have been able to scale our applications to meet increased demand during sales periods and new product launches. Since our initial adoption of EKS, we have made significant progress in bolstering security and automating management of our clusters leveraging Bottlerocket, Fargate & Karpenter.

Enhancing the Container Host OS

At Target, when we started using EKS, we were using EKS optimised Amazon Linux as the host OS for the worker nodes in our clusters. The base AMIs, provided by AWS, are built on top of Amazon Linux 2 (https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html). It is of course a general purpose OS, with plenty of binary dependencies that are unnecessary for the workload compute layer in an EKS cluster.

Here’s an approximation of the various layers in the host OS we use that is based on Amazon Linux.

AL2 Layers

And we were not using the provided EKS optimised AL2 AMIs as is, we were baking custom AMIs to ensure they were CIS hardened. And we put it through the CIS benchmark for Amazon Linux 2 which is perhaps not appropriate for usage within a container orchestrated environments. Some of the CIS hardening rules impacted k8s networking and we had to disable some of those to ensure our clusters were functional.

AMI Bake Pipeline

Maintenance of the custom AMI pipeline started becoming more and more challenging. Sometime last year we tried to run newer m6i instances and something related to our customisation was preventing those instances from joining our clusters. We couldn’t quite figure out what it was, but it was a major hassle.

We wanted to get away from the chore of “baking” our own AMIs. We seemed to just add to the “bloat” of a general purpose OS with our custom “baking”.

Bottlerocket OS

Bottlerocket OS

Since around end of 2020, when Bottlerocket became GA (or perhaps before), we had our eye on it. Bottlerocket was purpose built for running containers.

Compared to the Amazon Linux based host OS, Bottlerocket is so much leaner which meant a thin surface of attack.

Bottlerocket Layers

It does not have the bells and whistles (in the context of container orchestration) associated with a general purpose OS, like compilers, package managers, SSH server, not even interactive shells.

  • Instead of having a package manager, Bottlerocket uses an update mechanism with partition flips of the entire filesystem image. (more on that here: https://github.com/bottlerocket-os/bottlerocket#updates).
  • There is no need for an SSH server. To access hosts we can use the more secure AWS SSM Session Manager, which is part of a “control container”.
  • Instead of having interactive shells like bourne shell, bash, zsh etc, Bottlerocket’s “control container” can be used to introspect hosts and make API calls to manage hosts.

Having fewer software components of course means fewer components that contribute to security vulnerabilities. Aside from having a very thin surface of attack, Bottlerocket was designed from the ground up to be security hardened. The following are some of the key security features of Bottlerocket:

  • Automated security updates: A k8s operator is available for in-place updates to bottlerocket hosts.
  • Immutable root filesystem backed by dm-verify: So, userspace processes cannot modify rootfs protecting against container escape vulnerabilities.
  • Bottlerocket uses tmpfs, memory-backed filesystem, for /etc: So modification of system config is not supported. OS Config changes can be done through an API or through containers written to implement CNI or CSI.
  • No shell or interpreters installed: That’s no SSH nor python. So that’s something else we don’t need to worry about. For troubleshooting, an admin container (disabled by default) will have to be created that will provide shell access and has an Amazon Linux 2 base. The act of logging into an individual Bottlerocket instance is intended to be an infrequent operation for advanced debugging and troubleshooting
  • Executables built with hardening flags
  • SELinux enabled in enforcing mode: No option to disable it.

Rolling out Bottlerocket

The lean nature of Bottlerocket is not without challenges. We had to make the following changes:

1. Used Container Storage Interface (CSI) drivers for any persistent storage used by our application workloads:

2. Updated configuration of some auxiliary applications to support containerd as opposed to docker. (this would have had to happen on any host OS eventually with dockershim deprecation, but Bottlerocket for k8s defaulted to containerd ahead of that)

  • Log forwarding configuration of our observability solution had to be tweaked.
  • Switched to the CRIO/ContainerD version of a DaemonSet for our Vulnerability Management agent (Cloud Workload Protection / CWP).

3. Switched to an entirely new Endpoint Detection & Response solution, one that is more cloud native. Using their eBPF based agent so that it can support Bottlerocket without having to build kernel modules.

4. We had some applications for which we needed to safe-list some networking related sysctl in kubelet config (https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/). Bottlerocket has a different mechanism for settings using TOML syntax; https://github.com/bottlerocket-os/bottlerocket#settings.

Outcomes from switching to Bottlerocket

Security

A key metric for measuring how secure Bottlerocket is when compared to our previous host OS was to check vulnerabilities identified. We compared reports from our vulnerability management solution and found some very favourable results.

Vulnerabilities comparison

With a simple weight multiplier applied to the vulnerabilities, *4 for Critical, *3 for High, *2 for Medium & *1 for Low, our vulnerability score used to be a big fat 804. With Bottlerocket, we brought it down to just 3.

Performance​

We did expect improvement in performance, but the results from our performance tests significantly exceeded our expectations. Some rejigging of security tooling may also have contributed to the results, especially the new eBPF based EDR agent.

  • 30.6–41.7 % response time reduction in customer facing applications.​
  • 33 % speedup in cart calculation logic.​
  • Increased order placement throughput.​
  • 16.2 % response time speedup in store employee facing applications.​​

Cost

  • 40 % reduction in compute capacity requirement.​

This lowered our AWS infra cost, especially during peak trade.​

Stability

We used to observe hosts becoming unhealthy, sporadically.​ We suspected that the causes for instability were​:

  • Using the incorrect CIS benchmark for baking the Amazon Linux 2 AMIs.​
  • Legacy security tooling that did not support container orchestration environments.​

We have not observed these issues since rolling out Bottlerocket.​

Uplifting Compute Provisioning

For provisioning / scaling compute in EKS clusters, we, like most, were using cluster-autoscaler (CA) https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler. The provisioning mechanism can be best described as:

  • Watch for any pending, non schedulable Pods.
  • If there are any, interact with one of the AutoScaling Groups (ASG) / node-groups associated with the cluster to increase its size.
  • Kubernetes scheduler (part of the control plane) assigns the pending Pods to the newly provisioned Node (EC2) once it joins the cluster.
How Cluster Autoscaler Provisions Compute

Limitations with Cluster Autoscaler

Cluster Autoscaler works quite well for the most part, but has some limitations.

Cumbersome security updates

The update process for our nodes were tricky, because they are part of an ASG. The patching strategy was to bake a new AMI with all the updates, then create a new ASG and delete the old one.

Update strategy with ASGs / node-groups

With bottlerocket, there is also the option of running the https://github.com/bottlerocket-os/bottlerocket-update-operator that supports in-place updates. But the in-place updates mean that any new instances that are spun up, would be instantiated using the version specified in the ASG before being updated by the operator.

Nodes within a node-group must be homogeneous

Cluster Autoscaler does not support heterogenous nodes in a node-group. Say, nodes with 8 vCPUs and 4 vCPUs in the same node-group / ASG.

Not Availability Zone aware

We would need node-groups per availability zone (AZ) if we have applications with AZ affinity. e.g., if we have applications that use EBS volumes.

Nondeterministic spread of applications across AZs

The node groups themselves can be spread across availability zones, but without significant additional configuration, applications need not be spread across availability zone. In the example below we have nodes across 3 AZs, but an application (in yellow) incidentally in just a single AZ. An AZ outage can cause some disruption to such an application.

Non deterministic spread of application with cluster autoscaler

Instance distribution / capacity type configuration for cost optimisation applies more broadly

Here is an example ClusterConfig (for use with eksctl of-course).

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
name: test
version: "1.24"

nodeGroups:
- name: node-group-1
privateNetworking: true
amiFamily: Bottlerocket
minSize: 1
maxSize: 10
instancesDistribution:
instanceTypes:
- m6i.2xlarge
- m5.2xlarge
- m5a.2xlarge
- m5ad.2xlarge
- m5d.2xlarge
onDemandBaseCapacity: 3
onDemandPercentageAboveBaseCapacity: 25

The instance distribution applies across workloads. To get more granular control, like having specific applications use on-demand capacity instead of spot will require more complex config using labels and taints. And to fallback to using on-demand capacity when spot capacity is limited or unavailable, we would need separate ASGs for spot and on-demand capacity as opposed to mixing them.

Karpenter

Karpenter

When Karpenter became generally available, we found the solution to most of the limitations we were observing with Cluster Autoscaler. The concept of “groupless” worker nodes seemed perfect. Roughly, here is what compute provisioning with Karpenter looks like:

How Karpenter Provisions Compute

Instead of interacting with ASG APIs, Karpenter interacts with EC2 APIs. In practice this has meant faster compute provisioning for Target’s EKS clusters.

How we adopted Karpenter

There’s an excellent document on Karpenter’s official pages for migrating from Cluster Autoscaler to Karpenter: Migrating from Cluster Autoscaler. Here is what we did:

Prerequisite infrastructure

There were some AWS resources we needed to create before actually installing Karpenter on our clusters.

  • Node IAM Role and Instance Profile.​
  • SQS queue that subscribes to EC2 event bridge rules for spot interruption, health, rebalance and instance state changes.​
  • IAM Role for Service Account (IRSA) for Karpenter controller.​
  • Fargate profile for running Karpenter controller.​

For this prerequisite infrastructure, we added a CDK stack. We like how concise and testable our infrastructure as code is with CDK.

Installing Karpenter

  • Add the Node IAM Role we created for Karpenter to the aws-auth ConfigMap. We have a custom pipeline to update this ConfigMap in an idempotent manner. [We are closely following an issue​ on the AWS Containers Roadmap which should make this a lot easier: https://github.com/aws/containers-roadmap/issues/185]
  • Install Karpenter controller using helm charts.​ (there are 2 charts, a main chart for Karpenter and another for managing the CRDs)
  • Install Provisioners (+ AWSNodeTemplate)​. Right before this step would be a good time to uninstall Cluster Autoscaler.
  • Delete the node groups.​ Using eksctl will help ensure the old nodes are cordoned and drained allowing for a disruption free migration, if your applications have appropriate Pod Disruption Budgets (https://kubernetes.io/docs/tasks/run-application/configure-pdb/).

Fargate Usage

Fargate

Yes, we do use Fargate to run Karpenter controller itself. How the compute for the compute provisioner (Karpenter controller) is provisioned is something of a chicken and egg problem.

The other option was to create a node-group just for the Karpenter controller. This seemed counter intuitive to us. This would be a very simple fixed size ASG, but we would have still had to use our old update strategy on this ASG. Fargate seemed like an excellent fit because host updates are automated along with provisioning. So, having a Fargate profile for Karpenter controller (selecting karpenter namespace) seemed like a good solution to us, until / hopefully Karpenter comes out of the box in the EKS control plane; https://github.com/aws/containers-roadmap/issues/1792.

Outcomes from adopting Karpenter

Security

We have automated security updates to the host OS using Karpenter. We leverage the ttlSecondsUntilExpired setting provided by Karpenter for upgrading nodes. And since we use Bottlerocket and not a custom AMI, Karpenter uses the latest version of Bottlerocket when new nodes are provisioned. So, new nodes provisioned to scale out and / or to replace existing nodes that expire would be running a newer host OS with security patches.

Be careful with the ttlSecondsUntilExpired setting though. Make sure your applications have the right Pod Disruption Budgets; https://kubernetes.io/docs/tasks/run-application/configure-pdb/

Cost

With Karpenter we get much more reliable mix of spot and on-demand capacity without too much trouble. Before, we were setting base on-demand capacity to ensure some applications were not disrupted. Now we can either use nodeAffinity or nodeSelector for applications that need on-demand capacity. Here is an example:

apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
...
nodeSelector:
karpenter.sh/capacity-type: on-demand

So, most of our clusters use just spot instances now.

Karpenter also has a consolidation feature to achieve better bin packing and hence reduce unused compute capacity.

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
consolidation:
enabled: true
...

There was a demonstration of this feature at AWS re:Invent 2022. The demonstration used https://github.com/awslabs/eks-node-viewer to measure and showcase the impact of consolidation. Here’s a link to that demonstration:

Availability

With Karpenter, we can also easily configure our applications to achieve more deterministic spread across availability zones (AZ). Here is a sample Deployment with the right topologySpreadConstraints:

apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
...
topologySpreadConstraints:
- labelSelector:
matchLabels:
app: my-app
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway

We also don’t need the added complexity of multiple ASGs / node-groups per AZ to ensure any applications that use EBS volumes are scheduled.

Flexibility

Before Karpenter, we used to carefully plan out the EC2 instance types we used in our clusters to ensure our applications “fit” well in our node groups. Now, we let Karpenter to do all the math to find the right sized nodes / EC2s.

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
requirements:
- key: karpenter.k8s.aws/instance-category
operator: In
values:
- c
- m
- r
- t
- key: karpenter.k8s.aws/instance-cpu
operator: Gt
values:
- "3"
- key: karpenter.k8s.aws/instance-cpu
operator: Lt
values:
- "9"
- key: karpenter.k8s.aws/instance-hypervisor
operator: In
values:
- nitro
...

Another benefit is that we don’t need a separate DaemonSet or Deployment for interruption handing anymore. We were using https://github.com/aws/aws-node-termination-handler before, but Karpenter incorporates interruption handling too, if you configure an SQS queue for it; https://karpenter.sh/v0.29/concepts/deprovisioning/#interruption.

Conclusion

Amazon Elastic Kubernetes Service makes it easier for organisations to adopt Kubernetes. And adopting Bottlerocket, Fargate & Karpenter has helped Target Australia bolster our security posture and simplify our usage of Kubernetes. We hope that sharing this account of our experience would be useful to the community.

The views and opinions expressed in this document are solely those of the original author. These views and opinions do not necessarily represent those of Target Australia.

--

--

No responses yet