Scaling workloads with the big savings quartet: EKS, Fargate, Karpenter and Keda

Kaio Cunha
16 min readFeb 10, 2024

--

Introduction

The guinea pig chosen for this project is Metabase, an open source tool used when some corporate guys have some big questions to make about their companies. I had never worked with it and that’s why it’s a good choice. It’s written in Clojure and JavaScript, needs a database and consumes lots of CPU and memory.

This endeavor involves a blend of Terraform, EKS, Fargate and Karpenter, integrated with Istio for service mesh, Prometheus, Grafana, and Keda for monitoring and auto-scaling workloads based on traffic metrics.

All code and documentation can be found here.

This is what we’re aiming for:

These are all the tools used:

  • Terraform: For infrastructure as code.
  • EKS: For managed Kubernetes on AWS.
  • Fargate: For serverless compute on Kubernetes.
  • Karpenter: For autoscaling Kubernetes clusters.
  • Metabase: For business intelligence and analytics.
  • GitHub Actions: For CI/CD.
  • Docker: For local development and testing.
  • Istio: For service mesh and rich traffic metrics.
  • Prometheus and Grafana: For metrics scraping and observability.
  • Keda: For scaling Metabase based on istio_requests_total and memory usage.

Here is the project tree overview:

.
|-- .github
| |-- workflows
| | |-- apply-all.yaml
| | |-- apply-workflow.yaml
| | |-- destroy-workflow.yaml
| | |-- manual-apply.yaml
| | |-- plan-workflow.yaml
| | |-- stack-workflow.yaml
| | `-- uninstall-workflow.yaml
|-- README.md
|-- environments
| |-- dev
| `-- lab
| |-- backend.tf
| |-- main.tf
| |-- outputs.tf
| |-- providers.tf
| |-- s3-dynamodb
| | `-- main.tf
| `-- variables.tf
|-- infra
| |-- backend
| | |-- main.tf
| | |-- outputs.tf
| | `-- variables.tf
| |-- eks-fargate-karpenter
| | |-- main.tf
| | |-- outputs.tf
| | `-- variables.tf
| |-- rds
| | |-- main.tf
| | |-- outputs.tf
| | `-- variables.tf
| `-- vpc
| |-- main.tf
| |-- outputs.tf
| `-- variables.tf
|-- stack
| |-- istio
| | |-- istio-ingress.yaml
| | |-- istiod-values.yaml
| | |-- pod-monitor.yaml
| | `-- service-monitor.yaml
| |-- keda
| | `-- values.yaml
| |-- metabase
| | |-- metabase-hpa.yaml
| | |-- metabase-scaling-dashboard.yaml
| | `-- values.yaml
| `-- monitoring
| `-- values.yaml

Step 1: Local environment

We kick off by crafting an environment tailored for the task. This setup includes most essential tools and configurations needed for infrastructure management and development. I recommend using WSL, if you’re on Windows, but it should also work for macOS.

Just clone this devenv repo and make sure you give it the right permissions:

chmod +x *.sh

Then, run it from the main file:

./main.sh

If you want to install the pre-requisites by yourself, here they are:

  • Terraform CLI
  • AWS CLI
  • kubectl
  • kubectx

Step 2: AWS credentials and Terraform Backend on S3 and GitHub Workflows

Now, you have to run aws configure and provide your AWS access key ID, secret access key, default region, and output format. The same information should be stored on your GitHub repository secrets before we get to the CI/CD bit of this article. To learn how you can create your AWS credentials, follow the instructions here. For GitHub secrets, follow these instructions.

All workflows are configured to use these credentials:

env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: 'us-east-1'

- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_DEFAULT_REGION }}

With the local environment and credentials ready, it’s time to cd into(or onto?) the environments/lab/s3-dynamodb folder where a main.tf is located.

Here, we create an S3 bucket and a DynamoDB table, to serve as a Terraform backend:

module "backend" {
source = "../../../infra/backend"
region = "us-east-1"
bucket_name = "tfstate-somename-lab"
dynamodb_table_name = "tfstate-somename-lab-lock"
}

Run terraform init , then terraform apply , and never turn back.

Finally, cd back to the environments/lab folder and run terraform init .

Because of the backend.tf file located in the same dir, you should see something like this:

Initializing the backend...

Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.

Initializing provider plugins...
- Checking for available provider plugins...
- Downloading plugin for provider "aws" (hashicorp/aws)...

Terraform has been successfully initialized!

That’s it.

You’ll have your infrastructure state stored on AWS S3 from now on.

GitHub Workflows Configuration

The following workflows were created to automate the CI/CD process:

  • plan-workflow.yaml: Runs terraform plan to review the changes that will be applied to the infrastructure. It triggers on pull requests to the main branch.
  • apply-workflow.yaml: Runs terraform apply to apply the changes to the infrastructure in a specific order using the target parameter. It triggers on pushes to the main branch. To avoid triggering this workflow, like when only documentation changes are made, add [skip ci] to the commit message.
  • destroy-workflow.yaml: Runs terraform destroy to destroy the infrastructure. It can only be triggered manually from the GitHub Actions page.
  • stack-workflow.yaml: Runs helm upgrade --install or helm uninstall on all the extra Kubernetes components as well as Metabase. It can only be triggered manually from the GitHub Actions page on the cluster-stack or main branches. You can also select which addon or workload to upgrade or uninstall by adding their names as inputs to the workflow:
Cluster Stack Workflow

For quick tests, the best option is to push changes to the cluster-stack branch and trigger this workflow manually. This way, you can test the changes without affecting the main branch.

There is also an apply-all.yaml workflow, which applies both the AWS infrastructure and the cluster stack.

All these workflows can be found on the .github/workflows folder.

Step 3: AWS Core Infrastructure with VPC, EKS, Karpenter and Fargate

After getting the local environment ready, setting up a CI/CD pipeline with Github Actions, and configuring an S3 backend, let’s turn the attention to the core AWS infrastructure.

Creating a VPC:

The first order of business was establishing a VPC, which I named lab-vpc. This was to ensure I had an isolated network space for my lab setup. I made sure the VPC spanned across three Availability Zones to enhance availability and fault tolerance.

Here’s how I laid out the VPC:

module "lab_vpc" {
source = "../../infra/vpc"

name = local.name
vpc_cidr = local.vpc_cidr
azs = local.azs
private_subnets = ["10.0.0.0/19", "10.0.32.0/19", "10.0.128.0/19"]
public_subnets = ["10.0.64.0/19", "10.0.96.0/19", "10.0.160.0/19"]
intra_subnets = ["10.0.192.0/19", "10.0.224.0/19"]

tags = local.tags
}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.5.0"

name = var.name
cidr = var.vpc_cidr

azs = var.azs
private_subnets = var.private_subnets
public_subnets = var.public_subnets
intra_subnets = var.intra_subnets

enable_nat_gateway = true
single_nat_gateway = true

public_subnet_tags = {
"kubernetes.io/role/elb" = "1"
"kubernetes.io/cluster/${var.name}" = "shared"
"profile" = "public"
}

private_subnet_tags = {
"kubernetes.io/role/internal-elb" = "1"
"kubernetes.io/cluster/${var.name}" = "shared"
# Tags subnets for Karpenter auto-discovery
"karpenter.sh/discovery" = var.name
"profile" = "private"
}

tags = var.tags
}

EKS Cluster Setup with Fargate and Karpenter:

Next on my list was setting up an EKS cluster named metabaselab.I chose Fargate for its serverless compute capabilities, perfect for running pods that don’t need the full capacity of an EC2 instance. Karpenter was deployed on Fargate nodes to handle autoscaling. It is advised that Karpenter is not run on the same nodes it manages. Fargate will provide fast autoscaling when needed as well as a 99.99% uptime guaranteed.

Here’s how it was done:

module "eks_fargate_karpenter" {
source = "../../infra/eks-fargate-karpenter"

cluster_name = "metabaselab"
cluster_version = "1.28"
vpc_id = module.lab_vpc.vpc_id
subnet_ids = module.lab_vpc.private_subnets
control_plane_subnet_ids = module.lab_vpc.intra_subnets

fargate_profiles = {
karpenter = {
selectors = [
{ namespace = "karpenter" }
]
}
kube-system = {
selectors = [
{ namespace = "kube-system" }
]
}
}
}

It will apply all resources on source = “../../infra/eks-fargate-karpenter” which, among other resources, has a NodePool defined for Karpenter to guide its decisions:

resource "kubectl_manifest" "karpenter_node_pool" {
yaml_body = <<-YAML
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
nodeClassRef:
name: default
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: "node.kubernetes.io/instance-type"
operator: In
values: ["t2.micro", "t3.micro", "t3.small"]
# Resource limits constrain the total size of the cluster.
# Limits prevent Karpenter from creating new instances once the limit is exceeded.
limits:
cpu: 56
memory: 56Gi
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 30s
kubelet:
maxPods: 100
YAML

depends_on = [
kubectl_manifest.karpenter_node_class
]
}

To summarize, it tells Karpenter to only provision t2.micro, t3.micro and t3.small instance types. In a multi-tenant environment, you could have multiple NodePools for different scaling needs, excellent for internal developer platforms.

In case you want to run another workload on Fargate, you just need to create another Fargate profile:

fargate_profiles = {
karpenter = {
selectors = [
{ namespace = "karpenter" }
]
}
kube-system = {
selectors = [
{ namespace = "kube-system" }
]
}
}

Demonstrating Karpenter’s Scaling Abilities:

With the eks-fargate-karpenter module, goes an example deployment with zero replicas just to test Karpenter:

resource "kubectl_manifest" "karpenter_example_deployment" {
yaml_body = <<-YAML
apiVersion: apps/v1
kind: Deployment
metadata:
name: inflate
spec:
replicas: 0
selector:
matchLabels:
app: inflate
template:
metadata:
labels:
app: inflate
spec:
terminationGracePeriodSeconds: 0
containers:
- name: inflate
image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
resources:
requests:
cpu: 1
YAML

depends_on = [
helm_release.karpenter
]
}

To put Karpenter to the test, simply run kubectl scale deployment inflate --replicas=10. You should see some nodes start popping up:

$ kubectl get nodes

NAME STATUS ROLES AGE VERSION
ip-192-168-1-100.ec2.internal Ready <none> 4m v1.28.0
ip-192-168-1-101.ec2.internal Ready <none> 3m v1.28.0
fargate-ip-192-168-1-200.fargate.internal Ready <none> 2m30s v1.28.0
fargate-ip-192-168-1-201.fargate.internal Ready <none> 2m v1.28.0
fargate-ip-192-168-1-202.fargate.internal Ready <none> 1m30s v1.28.0
fargate-ip-192-168-1-203.fargate.internal Ready <none> 1m v1.28.0

Workloads running on regular EC2 instances. kube-system and karpenter, on Fargate nodes.

Step 4: A decoupled database with RDS for Metabase

So, the next thing I did was get an RDS instance up and running for the Metabase database. This step is pretty key because it’s where the data lives and the lifecycle of Metabase should be independent of the database it uses(best practices). I used the infra/rds module for this. Here’s how I set it up:

module "lab_rds" {
source = "../../infra/rds"

db_name = local.name
db_username = local.name
db_port = 3306
db_password = var.db_password

vpc_security_group_ids = [module.security_group.security_group_id, module.eks_fargate_karpenter.cluster_primary_security_group_id]
subnet_ids = module.lab_vpc.private_subnets

tags = local.tags
}

I added a security group. This basically controls who can talk to the RDS instance:

module "security_group" {
source = "terraform-aws-modules/security-group/aws"
version = "~> 5.0"

name = local.name
vpc_id = module.lab_vpc.vpc_id

ingress_with_source_security_group_id = [
{
source_security_group_id = module.eks_fargate_karpenter.cluster_primary_security_group_id
from_port = 3306
to_port = 3306
protocol = "tcp"
description = "MySQL access from within VPC"
},
]

tags = local.tags
}

This setup places the RDS instance right in the same VPC as the EKS cluster, tucked away in the private subnets for extra security. The security group I set up ensures that only traffic on port 3306 (the MySQL port) can get through, which is just what we need.

I kept the RDS setup separate from the Metabase deployment. This gives me the flexibility to switch databases for Metabase if needed, just by updating the database connection string in the Metabase setup.

Step 5: Rolling Out the AWS Load Balancer Controller for service exposure

Next up was deploying the AWS Load Balancer Controller, which is pretty much the gatekeeper for letting traffic from the outside world chat with services inside the cluster. I wove this into the main Terraform setup like so:

data "http" "iam_policy" {
url = "https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/main/docs/install/iam_policy.json"
}

resource "aws_iam_policy" "load_balancer_controller" {
name = "AWSLoadBalancerControllerIAMPolicy"
description = "Policy for the AWS Load Balancer Controller"
policy = data.http.iam_policy.body
}

resource "aws_iam_role" "load_balancer_controller_role" {
name = "eks-load-balancer-controller-role"

assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Action = "sts:AssumeRoleWithWebIdentity",
Effect = "Allow",
Principal = {
Federated = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/oidc.eks.${local.region}.amazonaws.com/id/${local.oidc_id}"
},
Condition = {
StringEquals = {
"${module.eks_fargate_karpenter.oidc_provider.oidc_provider}:sub" : "system:serviceaccount:kube-system:aws-load-balancer-controller",
"${module.eks_fargate_karpenter.oidc_provider.oidc_provider}:aud" : "sts.amazonaws.com"
}
}
}
]
})
}

resource "aws_iam_role_policy_attachment" "load_balancer_controller_policy_attach" {
role = aws_iam_role.load_balancer_controller_role.name
policy_arn = aws_iam_policy.load_balancer_controller.arn
}

resource "kubernetes_service_account" "load_balancer_controller" {
metadata {
name = "aws-load-balancer-controller"
namespace = "kube-system"
annotations = {
"eks.amazonaws.com/role-arn" = aws_iam_role.load_balancer_controller_role.arn
}
}
}

resource "helm_release" "aws_load_balancer_controller" {
name = "aws-load-balancer-controller"
repository = "https://aws.github.io/eks-charts"
chart = "aws-load-balancer-controller"
namespace = "kube-system"

set {
name = "clusterName"
value = local.name
}

set {
name = "region"
value = local.region
}

set {
name = "serviceAccount.create"
value = "false"
}

set {
name = "serviceAccount.name"
value = "aws-load-balancer-controller"
}

set {
name = "vpcId"
value = module.lab_vpc.vpc_id
}

timeout = 3600
wait = true

depends_on = [aws_iam_role.load_balancer_controller_role, kubernetes_service_account.load_balancer_controller]
}

With this controller in place, there are a couple of ways to open up the doors for traffic to flow into the cluster:

Option 1: Going with Istio Ingress Gateway. This route lets you tap into Istio’s bag of tricks — think traffic management, bolstered security, and the ability to keep a keen eye on everything. You’d set up a VirtualService and a Gateway for each service you’re looking to expose. Going this route, you’d typically need a domain name. With a domain, you can create more user-friendly URLs for your services. Instead of relying on automatically generated or IP-based addresses, you can have URLs like service.yourdomain.com which are easier to remember and share.

Option 2: Direct Kubernetes Services of Type LoadBalancer. This is more straight-up, where you create Kubernetes services set to LoadBalancer type, and AWS automatically conjures up a load balancer (Classic, ALB, or NLB, depending on your annotations) for each service. Just set the service type to LoadBalancer and tag it with the right annotations to get the AWS Load Balancer to behave exactly how you need.

I chose option 2.

This method is super straightforward, perfect for when you don’t need the fancy routing and management that Istio offers. It’s ideal for simpler setups where services get their own dedicated load balancer for a direct line to the outside world. Keep in mind that this method will expose your service to the internet.

Step 6: Configuring Metabase Deployment

Here’s the setup I used to deploy Metabase:

database:
type: mysql
port: 3306
dbname: metabaselab
username: metabaselab

service:
name: metabase
type: LoadBalancer
externalPort: 80
internalPort: 3000
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "external"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"

resources:
requests:
cpu: 1000m
memory: 800Mi
limits:
cpu: 1250m
memory: 1Gi

monitoring:
enabled: true
serviceMonitor:
enabled: true
port: 9191

strategy:
type: Recreate

Database & Service Specs:

  • Database Configuration: Hooks up Metabase to a MySQL database, with details like host, port, database name, and username.
  • Service & Resources: Sets up a LoadBalancer service for public access, alongside resource allocations for smooth operation within Kubernetes based on Metabase’s guidelines for memory and scaling.

Scaling with KEDA

For scaling Metabase dynamically, I leaned on KEDA. The ScaledObject configuration looked something like this:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: metabase
namespace: metabase
spec:
scaleTargetRef:
kind: Deployment
name: metabase
minReplicaCount: 1
maxReplicaCount: 10
cooldownPeriod: 30
pollingInterval: 1
fallback:
failureThreshold: 1
replicas: 1
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-operated.monitoring:9090
metricName: requests_per_second
# (rate) will give us the number of requests per second within a 2 minute window. sum() will add add data from all pods.
query: |
sum(rate(istio_requests_total{destination_workload="metabase"}[2m]))
threshold: "100"
- type: memory
metricType: Utilization
metadata:
value: "110"

This setup allows Metabase to flex its muscles based on the crowd it’s handling (i.e., requests per second) and how heavy the workload is (memory usage). It’s pretty smart about it, too, with a setup that scales from 1 to 10 replicas as needed, checks in every second, and cools down for 30 seconds after scaling actions.

The target memory utilization is set to 150%. This is because HPAs are based on resource requests, not limits, to determine when to scale by percent. So, my Metabase deployment will scale out in two scenarios:

  1. The number of istio_requests_total per second of all Metabase pods reach 100
  2. The memory utilization reach 880Mi.

After deploying the ScaledObject(next step), you can check the HPA created by Keda:

sre@99326d08570e:~$ kubectl get hpa -n metabase
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
keda-hpa-metabase Deployment/metabase 107143m/100 (avg), 75%/110% 1 10 7 88m

When the workload allows it, you can set Keda to scale in to zero replicas for even bigger savings.

Step 7: Rolling Out the Cluster Stack: The Essentials + Metabase

Alright, this step is where things really start to take shape for making Metabase smart enough to scale based on how busy it gets and how much it’s thinking (a.k.a. requests per second and memory usage). Not to mention, this is also where we dive deep into the monitoring and observability pool. I laid it all out using a stack workflow that installs all the stack in the correct order of dependencies:

name: Cluster Stack Management

env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: 'us-east-1'
EKS_CLUSTER_NAME: 'metabaselab'
RDS_PASSWORD: ${{ secrets.RDS_PASSWORD }}

on:
workflow_dispatch:
inputs:
components:
description: 'Comma-separated list of components to apply (e.g., istio,metabase)'
required: true
default: 'keda,metrics-server,monitoring,metabase,istio'

jobs:
helm:
runs-on: ubuntu-latest
steps:
- name: Checkout Repository
uses: actions/checkout@v2

- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_DEFAULT_REGION }}

- name: Update kube config
run: aws eks update-kubeconfig --name $EKS_CLUSTER_NAME --region $AWS_DEFAULT_REGION

- name: Set up Helm
uses: azure/setup-helm@v1
with:
version: 'v3.13.3'

- name: Install metrics-server
if: contains(github.event.inputs.components, 'metrics-server')
run: |
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/high-availability-1.21+.yaml

- name: Helm install kube-prometheus-stack(monitoring)
if: contains(github.event.inputs.components, 'monitoring')
run: |
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
cd stack/monitoring
helm upgrade --install monitoring prometheus-community/kube-prometheus-stack --namespace monitoring -f values.yaml --create-namespace

- name: Helm install Istio
if: contains(github.event.inputs.components, 'istio')
run: |
helm repo add istio https://istio-release.storage.googleapis.com/charts
helm repo update
cd stack/istio
helm upgrade --install istio-base istio/base -n istio-system --create-namespace --set defaultRevision=default
helm upgrade --install istiod istio/istiod -n istio-system -f istiod-values.yaml --wait
kubectl apply -f pod-monitor.yaml && kubectl apply -f service-monitor.yaml
helm upgrade --install istio-ingressgateway istio/gateway -n istio-system -f istio-ingress.yaml --wait

- name: Helm install KEDA
if: contains(github.event.inputs.components, 'keda')
run: |
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
cd stack/keda
helm upgrade --install keda kedacore/keda --namespace keda -f values.yaml --create-namespace

- name: Fetch RDS Endpoint for Metabase
if: contains(github.event.inputs.components, 'metabase')
run: |
RDS_ENDPOINT=$(aws rds describe-db-instances --db-instance-identifier metabaselab --query 'DBInstances[0].Endpoint.Address' --output text)
echo "RDS_ENDPOINT=$RDS_ENDPOINT" >> $GITHUB_ENV

- name: Helm install Metabase
if: contains(github.event.inputs.components, 'metabase')
run: |
if ! kubectl get namespace metabase; then
kubectl create namespace metabase
kubectl label namespace metabase istio-injection=enabled
fi

helm repo add pmint93 https://pmint93.github.io/helm-charts
helm repo update
cd stack/metabase
helm upgrade --install metabase pmint93/metabase --namespace metabase -f values.yaml --create-namespace \
--set database.host="$RDS_ENDPOINT" \
--set database.password="${{ secrets.RDS_PASSWORD }}"
kubectl apply -f metabase-hpa.yaml && kubectl apply -f metabase-scaling-dashboard.yaml

This workflow is only triggered manually. The best way to work with this workflow is to keep a separate branch just to update and run it.

All components installed in this step serve a different purpose:

  • Metrics Server: This step automates bringing the Metrics Server online within Kubernetes. It’s a critical component that makes autoscaling (via HPA) and tracking resource consumption possible.
  • kube-prometheus-stack: This part ensures the kube-prometheus-stack is deployed seamlessly, weaving Prometheus into our monitoring fabric and Grafana for insightful dashboards. Prometheus metrics will be used by Keda HPA to scale based on requests per second and by Grafana to feed the Keda dashboard we are going to use to monitor Metabase scaling.
  • Istio: We roll out Istio across the cluster. The setup includes Istio’s foundation, its control plane (Istiod), and an Ingress Gateway for managing incoming traffic. Istio is like a Swiss Army Knife for traffic management, but we’re going to use it mostly to provide rich traffic metrics to Prometheus.
  • Istio Ingress Gateway: Positioned as the gatekeeper for external service access, it’s on standby for whenever you decide to route traffic through it. Remember option 1 discussed earlier?
  • Istio PodMonitor and ServiceMonitor: These components keep an eye on Istio’s own components and any service running with Istio sidecars, like Metabase. They’re vital for pulling traffic metrics, which in turn, enable dynamic scaling based on the load.
  • KEDA Installation: This step brings KEDA into the fold, setting the stage for event-driven scaling within our Kubernetes landscape.
  • KEDA Dashboard: A dedicated dashboard for monitoring KEDA’s ScaledObjects and HPAs, making it easier to keep tabs on how Metabase scales in response to events.
  • Metabase Deployment: Here, we deploy Metabase onto our EKS cluster, using Helm for a smooth setup. The Metabase chart is tweaked to connect with our RDS database and to snugly fit within Istio’s service mesh using istio injection.
  • Metabase HPA: We introduce a Horizontal Pod Autoscaler (HPA) specifically for Metabase, which is created based on Keda’s ScaledObject. This enables our setup to automatically adjust Metabase’s resource allocation based on real-time traffic and memory demand.

Step 8: Accessing Metabase, Grafana, and Prometheus

When it comes to checking out the user interfaces for Metabase, Grafana, and Prometheus, here’s how you can get connected:

  1. Get their service’s IP address:
kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
metabase metabase LoadBalancer 172.20.64.5 xxxxx.elb.us-east-1.amazonaws.com 80:30961/TCP,9191:30516/TCP 89m
monitoring monitoring-grafana LoadBalancer 172.20.138.142 xxxxx.elb.us-east-1.amazonaws.com 80:31815/TCP 90m
monitoring monitoring-kube-prometheus-prometheus LoadBalancer 172.20.160.99 xxxxx.elb.us-east-1.amazonaws.com 9090:32740/TCP,8080:30352/TCP 90m

2. You can access the services by appending the appropriate path to this IP address in your browser. For example:

  • Metabase: xxxxx.elb.us-east-1.amazonaws.com
  • Grafana: xxxxx.elb.us-east-1.amazonaws.com
  • Prometheus: xxxxx.elb.us-east-1.amazonaws.com:9090/graph

As soon as users start accessing Metabase, you can see it scale out on Keda dashboard:

Keda Dashboard Panel

Wrapping It Up

To wrap up the exploration into scaling workloads with the big savings quartet of EKS, Fargate, Karpenter, and Keda, it’s crucial to emphasize that achieving significant cloud savings requires a harmonious alignment of these technologies. The project, which focused on leveraging Metabase as a test case, has illustrated the dynamic and flexible scaling capabilities possible when integrating these AWS and Kubernetes services effectively.

The journey began with a setup that included EKS for managing Kubernetes clusters on AWS, Fargate for serverless compute power, Karpenter for efficient and responsive autoscaling, and Keda for event-driven scaling based on actual workload demands. Each component plays a vital role in the ecosystem, contributing to both performance efficiency and cost optimization.

For organizations looking to optimize their cloud expenses while maintaining or enhancing service performance, the combination of these tools offers a compelling solution. By leveraging EKS and Fargate, you can manage your Kubernetes workloads with less overhead, allowing for a more streamlined operation. Karpenter further enhances this by intelligently managing resource allocation, ensuring that you’re only using what you need, when you need it. Finally, Keda closes the loop by enabling application scaling based on real-world events and metrics, ensuring that resources are precisely aligned with demand.

However, the key to unlocking these savings lies not just in the adoption of these technologies but in their strategic integration and alignment. It requires a thoughtful approach to architecture, a deep understanding of workload patterns, and a commitment to continuous optimization. When EKS, Fargate, Karpenter, and Keda are used in concert, with each part of the quartet playing to its strengths, organizations can achieve a scalable, efficient, and cost-effective cloud environment.

Cost Optimization — The Sweet Spot

When it comes down to saving bucks in the cloud, the real game-changer is right-sizing. It’s all about getting the fit just right — making sure you have exactly what you need, no more, no less, to keep things running smoothly without overspending.

--

--

Kaio Cunha

From strumming chords to deploying codes, I discuss music and tech.