Migrating CI/CD from Kubernetes to Compute Engine: a journey of cost efficiency and reliability

Published in

Untienots

8 min readNov 13, 2023

When many think of CI/CD (Continuous Integration and Continuous Delivery) pipelines, Kubernetes often stands out as a popular choice. Its robust orchestration capabilities have become synonymous with container management, and the appeal is undeniable. However, as our team experienced, sometimes a simpler approach can be more effective. In our case, transitioning from Kubernetes with Docker-in-Docker to Compute Engines with Instance Group (and autoscaling) led to significant savings and enhanced reliability. Let’s delve into our journey.

At Untie Nots, we pride ourselves on being at the forefront of innovation in retail loyalty, providing a cutting-edge, AI-based platform for personalized promotions that truly engage customers. Founded in 2015 and recently acquired by Eagle Eye, our journey has been marked by a commitment to revolutionize the retail experience. Just as we leverage behavioral data to offer gamified AI-based loyalty challenges and rewards tailored to shopper preferences, we apply the same precision and customization to our technical operations. Our migration from Kubernetes to a serverless stack on Google Cloud Platform (GCP) embodies this ethos. It reflects our relentless pursuit of efficiency, not just in the solutions we offer to retailers but in the robust, scalable, and cost-effective infrastructure that underpins our B2C pages, B2B interfaces and data stack. As we evolve our deployment strategies to harness the full potential of GCP, we ensure that our technology stack remains as dynamic and responsive as the personalized promotional campaigns we craft for our clients.

Gitlab CI/CD

Before diving into our initial setup, it’s essential to lay down some groundwork regarding GitLab CI and our choice of employing self-hosted runners. GitLab CI/CD is a powerful tool that facilitates continuous integration and continuous delivery within the GitLab ecosystem. While GitLab provides shared runners, we opted for self-hosted runners to have better control over the environment, resources, and to ensure the security of our sensitive data.

A distinctive feature of GitLab CI is its mechanism of job execution. Unlike some other systems where jobs are pushed to the runners, in GitLab CI, the runners pull the jobs from a centralized queue. This design has a couple of notable advantages. Firstly, it reduces the dependency on a central orchestrator, which could become a bottleneck or a single point of failure in a large-scale setup. Secondly, it allows for more straightforward scaling and improved reliability. Runners can be added or removed from the system seamlessly, and they fetch jobs whenever they have the capacity. This pull mechanism aligns well with the principles of distributed computing, paving the way for a robust and scalable CI/CD setup.

The Initial Setup: Kubernetes with Docker-in-Docker

Like many organizations, we initially turned to Kubernetes for its scalability and dynamic management capabilities. We specifically used Docker-in-Docker for our CI/CD setup. While this worked reasonably well, we began to observe some limitations:

Cost overhead: Kubernetes on GKE was costlier due to add-ons for services like monitoring and logging, which upped resource use and maintenance expenses. Additionally, GKE’s default pricing for the Kubernetes control plane, irrespective of node pool presence, further added to costs, compared to the streamlined Compute Engine instances. The abstraction of pods and nodes also complicated resource scaling, often leading to underutilization and unnecessary expenditure
Complexity: Docker-in-Docker introduces an added layer of complexity, which occasionally led to difficult-to-troubleshoot issues
Resource constraints: our CI/CD jobs often ran into resource constraints. Since Docker containers in the CI process shared resources with the Kubernetes node, some jobs were throttled or delayed
Unpredictable build times: due to the overhead of spinning up containers inside containers, and the added weight of the Kubernetes orchestration layer, build times were often unpredictable

Navigating Towards a Solution: The Switch to Compute Engine

Given the challenges we were facing, we began to explore alternatives that could provide a simpler setup, lower costs, and more reliable build times. Our search led us to Google Cloud’s bare Compute Engine. The core idea was to use Compute Engine’s Instance Template and Instance Group features to autoscale our GitLab runners based on the load from incoming jobs.

Setting the stage: Instance Templates and Instance Groups

The first step in our migration was to set up an Instance Template in Compute Engine. This template defines the virtual machine (VM) configurations, such as machine type, image, and disk type, which ensures uniform configurations for all instances within the group.

Managed Instance Group in action, utilizing Instance Templates to dynamically create and scale Compute Engine instances in response to fluctuating CI/CD workloads

Next, we created an Instance Group, which utilizes the previously defined Instance Template to manage a collection of VM instances. The beauty of Instance Groups is that they can automatically adjust the number of VM instances based on the load, ensuring optimal resource utilization and cost efficiency.

Autoscaling GitLab Runners: a configurable approach

With the infrastructure in place, we proceeded to configure our GitLab runners on the Compute Engine instances. Our objective was to have a setup where the number of runners would scale automatically based on the incoming job load.

Here’s a simplified outline of the steps we followed:

Create a Compute Engine image: we built a Compute Engine Image — a pre-configured virtual machine snapshot. This image serves as a template for every new VM instance, ensuring consistency and reliability across our infrastructure. It comes pre-loaded with:

Docker, to containerize our applications
Gitlab Runner binary, to orchestrate our CI/CD processes (installation tutorial)

Create an Instance Template: we developed an Instance Template, which outlines the specifications for creating VM instances. It dictates the machine type, size, and additional configurations, utilizing the Compute Engine Image as its base. We started by defining an Instance Template with the necessary configurations including the GitLab runner binary and necessary scripts for registration.

Machine type: e2-standard-4
VM provisioning model: spot, selected for being around 60% less costly than standard VMs, providing a cost-efficient solution for variable and interruptible nature of CI/CD jobs.
Startup script:

#! /bin/bash
cat << EOF > /etc/gitlab-runner/config.template.toml
[[runners]]
  [runners.cache]
    Type = "gcs"
    Path = "runner"
    Shared = true
    [runners.cache.gcs]
      BucketName = "gitlab_cache" # Specify here your GCS cache bucket if configured
  [runners.docker]
    privileged = true
    pull_policy = ["if-not-present"]
    volumes = ["/var/run/docker.sock:/var/run/docker.sock", "/cache"]
EOF

# Fix the 'No such file or directory' https://gitlab.com/gitlab-org/gitlab-runner/-/issues/1379
rm /home/gitlab-runner/.bash_logout || true

# Start gitlab-runner service
systemctl enable gitlab-runner
gitlab-runner start

gitlab-runner register --non-interactive \
  --template-config /etc/gitlab-runner/config.template.toml \
  --url "https://<YOUR_GITLAB_REPOSITORY_URL>" \
  --registration-token "<YOUR_GITLAB_REGISTRATION_TOKEN>" \
  --executor "docker" \
  --tag-list "vm-instance" \
  --run-untagged="true" \
  --docker-image="docker:20.10.16"

# Workaround to set concurrency to 2
sudo sed -i "s/concurrent.*/concurrent = 2/" /etc/gitlab-runner/config.toml

# Restart instance to apply the concurrency
sudo systemctl restart gitlab-runner

Additional remarks about the Instance Template:

Spot instances: these can be up to 90% cheaper than standard instances, providing significant cost savings, especially for the CI part of operations
Job tagging: allows directing jobs to specific machines based on tags, optimizing resource utilization
Concurrency tuning: this concurrency level refers to the number of CI/CD jobs that the GitLab runner can execute in parallel on a single machine. With e2-standard-4 machine type, a concurrency level of 2 was found to be optimal for running big jobs efficiently by allowing two jobs to run simultaneously without resource contention

Setting up a Stateless Instance Group: utilizing the Instance Template, we established a stateless Instance Group to orchestrate our runners, with an autoscaler configured to dynamically adjust the number of instances in response to the workload. Here’s a breakdown of our configuration:

Instance template: choose the previously created instance template
Autoscaling mode: set to ‘On’ to allow the system to automatically adjust the number of instances based on the load
Minimum number of instances: having a number greater than 0 reduces the “cold start” time. For example, setting it to 4 ensures that there are instances always ready to handle 4 * concurrency jobs, reducing wait times for CI/CD steps
Maximum number of instances: with concurrency set to 2, having n instances allows for n*2 jobs to run in parallel
Autoscaling signals: set to CPU utilization to trigger scaling actions based on the CPU usage
Target CPU utilization: this is the CPU utilization percentage that the autoscaler should aim to maintain across all instances. Through testing different configurations, we found a target CPU utilization of 50% to be ideal for our setup.
Predictive autoscaling: This feature anticipates scaling needs based on historical load data, preparing the system for expected load increases.
Initialization period: This is the time duration the autoscaler takes post-launch to observe the instances’ performance before making scaling decisions

Monitoring and insights: leveraging Cloud Monitoring for Instance Groups

This graph depicts the VM count and CPU utilization for CI/CD operations at Untie Nots, showcasing peaks during standard work hours, Monday through Friday, illustrating the adaptive scaling of resources in alignment with active development periods.

GCP’s Cloud Monitoring offers inherent integration with Instance Groups, providing us with automated insights and analytics. This built-in feature grants us a clear view of our resources in action. Through Cloud Monitoring, we’ve observed a pattern of increased CI/CD activity during standard work hours, with distinct peaks from Monday to Friday. These insights confirm the responsiveness of our autoscaling setup to the active development cycle, ensuring resources are allocated efficiently and in sync with our operational demands.

Reaping the benefits

Transitioning our CI/CD pipelines from Kubernetes to Compute Engine has been a voyage of discovery, yielding significant benefits in terms of cost-efficiency, reliability, and operational simplicity. This migration illuminates the importance of continuously evaluating and adapting our infrastructure to meet evolving needs and challenges. While Kubernetes is a powerful tool for container orchestration, our experience underscores that simpler solutions like Compute Engine can sometimes provide a more straightforward and cost-effective solution.

Our journey elucidates the potential of Compute Engine’s Instance Groups in creating a scalable, reliable, and cost-effective CI/CD pipeline. The autoscaling feature, in particular, showcased a remarkable capability to adjust resources dynamically based on load, ensuring optimal utilization and cost management. Moreover, the ease of setting up and managing GitLab runners on Compute Engine has significantly reduced the operational overhead, allowing our team to focus more on coding and less on managing infrastructure.

As we continue to optimize our CI/CD setup, the learnings from this migration will undoubtedly serve as a solid foundation for future infrastructure improvements. We hope our experience serves as a valuable reference for other organizations looking to optimize their CI/CD pipelines for better performance and cost-efficiency.