Gloat-ing About Our Multi-Arch EKS Migration: Cutting Costs with Graviton Nodes

Eliezer Yaacov
3 min readAug 18, 2024

--

Hey there, fellow cloud adventurers! 🚀 Ever found yourself knee-deep in Kubernetes configurations, wishing for a magic wand to make things work? Well, grab your coffee (or tea, no judgments here), and let me take you through my recent escapade of setting up a multi-architecture Amazon EKS cluster. Buckle up; it’s going to be a fun ride!

The Quest Begins: EKS Version 1.29 and Node Pools Galore

First, let me introduce you to Gloat.com. At Gloat, we’re putting people and organizations in motion. We connect talents to opportunities, helping companies and employees grow together. With clients all around the globe, we manage EKS clusters in different regions and environments (Dev, Stg, Prod) to ensure data privacy and optimal performance. Naturally, this means our AWS compute costs can skyrocket.

When I started with our EKS cluster running version 1.29, we had Karpenter in place from the get-go to help us manage our diverse and dynamic workloads. Our cluster was equipped with 5–6 different node pools, all of which were amd64 instances. But here’s the twist — I wanted to run both amd64 and arm64 (hello, Graviton!) nodes. Why? Because in the world of cloud computing, diversity is key, and Graviton processors offer cost efficiency and performance gains.

Multi-Arch Magic: Adding Arm64 Nodes

With Karpenter already in place, it was time to bring arm64 nodes into the mix. Here’s how we made it happen:

Defining Node Pools

We created NodePools for both amd64 and arm64 architectures. NodePools are like blueprints for your nodes, defining things like the instance types and other configurations.

AMD64 Node Pool

apiVersion: v1
kind: NodePool
metadata:
name: amd64-nodepool
spec:
instanceType: ["r5.large", "r5.xlarge"]
architecture: "amd64"

ARM64 (Graviton) Node Pool

apiVersion: v1
kind: NodePool
metadata:
name: arm64-nodepool
spec:
instanceType: ["r6g.large", "r6g.xlarge"]
architecture: "arm64"

Migrating Services: Smooth Sailing to Graviton

Now came the fun part — migrating our services to run on the new Graviton nodes. This is where I felt like a true cloud magician.

Updating Our CI Pipeline

To support multi-architecture images, we had to update our CI pipeline. We switched from the traditional Docker build command to `buildx`, which allows building images for multiple architectures. Here’s how it went down:

1. Set up `buildx`:

docker buildx create - use

2. Build and tag multi-arch images:

docker buildx build - platform linux/amd64,linux/arm64 -t my-app:latest - push .

3. Push to our internal ECR:

aws ecr get-login-password - region <region> | docker login - username AWS - password-stdin <account_id>.dkr.ecr.<region>.amazonaws.com
docker buildx build - platform linux/amd64,linux/arm64 -t <account_id>.dkr.ecr.<region>.amazonaws.com/my-app:latest - push .

Rebuilding All Images

We re-built all our images with the same tag, ensuring that they are now compatible with both amd64 and arm64 architectures. This meant that deployments without a `nodeSelector` defined (which, admittedly, is a bad practice but our current state) could run on any node in the cluster without issues, as the same tag now supports both architectures.

Gradual Migration to Graviton

To ensure a smooth transition, especially since we have several Python apps using AI and modeling libraries (which can be finicky on arm processors), we decided to migrate apps one by one. We created a node pool containing only Graviton nodes and used a `nodeSelector` block to start changing our apps individually:

Graviton Node Pool

apiVersion: v1
kind: NodePool
metadata:
name: arm64-nodepool
spec:
instanceType: ["r6g.large", "r6g.xlarge"]
architecture: "arm64"

Deployment with Node Selector

apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
containers:
- name: my-app-container
image: <account_id>.dkr.ecr.<region>.amazonaws.com/my-app:latest
ports:
- containerPort: 80
nodeSelector:
kubernetes.io/arch: arm64

Rolling Out Changes

With manifests updated, we rolled out the changes. Kubernetes, with its usual grace, handled the deployment seamlessly. Our services were now happily running on Graviton nodes, basking in the glory of arm64 efficiency.

Conclusion: The Hero’s Return

Migrating to a multi-arch EKS cluster managed with Karpenter was a journey worth taking. Not only did we cut costs by leveraging Graviton processors, but we also optimized our cluster for diverse workloads, all while carefully managing the transition for our more complex applications.

So, if you’re looking to embark on a similar adventure, take my advice: trust in Karpenter, embrace multi-arch, and remember to have fun along the way. Happy clustering!

— -

--

--