Welcome to Fiverr’s Infrastructure Renaissance: First Up — the Karpenter transformation

Published in

Fiverr Tech

5 min readJul 3, 2024

In this article, I cover:

How we migrated our Kubernetes compute nodes to Karpenter.
What we gained from it.
Public helm template and Grafana dashboard disclosure.

At Fiverr, we’re always looking for ways to improve our infrastructure and use our resources more efficiently. Recently, we undertook a significant migration effort: transitioning our Kubernetes compute nodes to Karpenter.

This article details our migration process, the benefits of the journey, and the tools we developed to streamline our operations.

First, allow me to explain why we needed a change

Before adopting Karpenter, we had always relied on a third-party solution to manage our compute nodes. This approach had several downsides, including:

Manual updates: Compute nodes were managed using Infrastructure as Code (IaC), requiring frequent “terraform-applies” to ensure the latest OS image was used. This process was cumbersome before we automated our IaC.
Explicit specifications: We had to specify instance types and scopes, which was inefficient for multi-architecture setups and automatic adoption of next generation instance types.
Manual node recycling: This was very time-consuming and required a closely monitored scripted process.
Resource wastage: Unused resources just can’t happen in this economy and with so many businesses like ours needed resources to keep innovating, this was a big source of frustration,
Spot interruptions: This happened way too often, causing system noise due to rapid node replacements.

And so, we made the decision to use Karpenter

This Kubernetes-native controller offers just-in-time capacity for workloads by dynamically scaling compute nodes based on actual resource requirements. It integrates with the Kubernetes scheduler to nominate nodes for pending pods.

Among its other key selling points (well, what sold us!) were:

Dynamic resource allocation: You specify the number of cores, minimum generation, architecture, and processor manufacturer to reduce node group customization overhead.
Consolidation: No more resource wastage since workloads are consolidated.
Compatibility: Ensures compatibility with Kubernetes OS image versions and architectures.
Capacity management: Falls back to on-demand or different instance types when capacity is unavailable.
Drift management: Automatically recycles nodes during patches or Kubernetes upgrades.
Maintenance windows: Allows budget and schedule settings to control node replacement frequency.
Priority weighting: Enables prioritization of different instance types to reduce system noise and spot interruptions.

But it wasn’t all rainbows and butterflies.

We encountered some challenges during our Karpenter adoption. I bring these up because you need the full story to make the best decision. First, we had to ensure equal node distribution across availability zones, which we solved by using topology-spread on the topology.kubernetes.io/zone key. We also had to address aggressive bin-packing to prevent overstressed nodes (which we did by implementing hostname-based topology-spread).

Managing spot and on-demand workload percentages with capacity-spread and topology-spread features was also a challenge, as was simplifying YAML management (but we handled that by creating a Helm template for node group specifications).

A bird’s-eye view of our migration flow

We first deployed the Karpenter Controller in a dedicated Auto Scaling Group (ASG).
Using our Helm chart, we specified our Kubernetes node groups.
We gradually reduced each node group’s minimum/maximum node count on the third-party solution. As nodes were removed, Karpenter took over until it managed the entire node group.
Migration was done one node group at a time, starting with the least critical.
For certain use cases, we disabled consolidation and specified instance idle TTL to prevent job interruptions.

Monitoring

Karpenter exposes very useful metrics for prometheus.
You can find an importable Grafana dashboard HERE.

In general — we monitor:

Interruption rates per NodeGroup (per NodePool); capacity usage out of limit (if defined); potential costs; capacity types; spread across availability zones; consolidations, drifts and provisioning rates; Karpenter’s performance in general.

Simplifying NodeGroup Management with Helm

Before Karpenter, managing NodeGroups with IaC required repetitive specifications. Our Helm template now allows us to define global and local specifications per NodeGroup, automatically add labels, tags, and taints, and easily create lists of NodeGroups and specify overrides.

The template is built to allow you to simply create a list of objects representing your NodeGroups and specify overrides where needed. For example — you might want to specify a different AMIFamily or IAM-Role/Instance-Profile for a specific NodeGroup (NodePool)

You are welcome to check out our implementation example HERE.

For those using ArgoCD, we recommend ignoring nodeClaims to avoid browser performance issues.

resource.exclusions: |
  - apiGroups:
    - "karpenter.sh"
    kinds:
    - "NodeClaim"

The bottom line

Migrating to Karpenter has transformed our Kubernetes operations at Fiverr, making our system more efficient, resilient, and easier to manage.

We use the following values files for the template to make it easily readable and re-usable.

common.yaml — this is where we specify the most common specifications for all node groups (IAM, AMIFamily, EBS, Subnets, Availability Zones, Security Groups, etc).

userdata.yaml — we can specify templated user data for all node groups.

nodegroups.yaml — this has a list of all the NodeGroups (NodePools) we want to generate.

While you can check all the above in the example in the GitHub repo, here is a glimpse:

nodeGroups:
  nodes-default:
    autoTaint: "false"
    weight: 2
    instances:
      categories:
        - m
        - r
    capacitySpread:
      start: 1
      end: 5
  nodes-default-od:
    autoTaint: "false"
    nodeGroupLabel: nodes-default
    capacitySpread:
      start: 6
      end: 6
    instances:
      minGeneration: 5
      categories:
        - m
        - r
      capacityType:
        - on-demand
    nodeClassRef:
      name: nodes-default-amd64
  nodes-workers:
    weight: 2
    instances:
      categories:
        - m
        - r
    capacitySpread:
      start: 1
      end: 5
  nodes-workers-c:
    nodeGroupLabel: nodes-workers
    capacitySpread:
      start: 1
      end: 5
    instances:
      categories:
        - c
    nodeClassRef:
      name: nodes-workers-amd64
  nodes-canary:
    instances: {}
    capacitySpread:
      start: 1
      end: 5
  nodes-jobs:
    expireAfter: "Never"
    instances:
      capacityType:
        - on-demand
      cores:
        - "8"
        - "16"
    consolidationPolicy: "WhenEmpty"
    blockDeviceMappings:
      - deviceName: /dev/xvda
        ebs:
          deleteOnTermination: true
          encrypted: true
          iops: 9000
          throughput: 125
          volumeSize: 500Gi
          volumeType: gp3
  nodes-ingress:
    registryCache: "false"
    expireAfter: "Never"
    instances:
      architecture: "multiarch"
      capacityType:
        - on-demand
      minGeneration: 7
      cores:
        - "8"

Want to work on projects like these at Fiverr? Check out our open positions:

Learn more about us https://www.fiverr.com/jobs/teams