A Tale Of A Migration

Derda Karakis
Insider Engineering
6 min readApr 25, 2022

At Insider, we have many repositories across all of our products. Unfortunately, having lots of repositories brings some problems within. These write-ups will look at the issues and how we solve those problems while evolving our code and infrastructure quality.

The Need

Since Insider releases 5 to 6 major versions every day with 12 product teams, QA Engineer should test every task independently with a new environment. Also, the newly created environment should include some base projects or base containers that the newly created container should communicate. Furthermore, developers should work with container basis whatever they need.

Legacy Infrastructure

The old stack depends heavily on Jenkins to provision spot EC2 instances and deploys our stack onto newly created EC2 instances. The Groovy file does several things, including extracting environment parameters that the projects should be deployed on or processing the SQL queries the task needs. Jenkins will ssh into the provisioned EC2 instance and transfer files with configuration settings for all environments to be set properly.

Problems With Legacy Infrastructure

To address Jenkins’ problems, we need to understand the workload that Jenkins should deal with.

There are 4 main pipelines and 7 improvement pipelines in Jenkins. As a result, more than 250 new jobs are executed every day, creating 100 spot EC2 instances. With this workload, we face some problems like,

  • Under provisioned EC2 instances
  • Jenkins Overload
  • Jenkins Configuration Management
  • Disk Management Issues
  • Long-Running Jobs inside EC2 instances
  • Operational Costs

With all problems addressed, we need to find a new home to avoid facing those problems and maintain our agility.

New Home

After lots of research and POCs, we have decided to use Kubernetes with Helm and Terraform. In addition, we have developed a new slack application for user input when requesting a new environment.

Let’s talk about the components;

Lambda a.k.a Vision Bot

A user requests a new environment using the slack interface. Slack bot will dispatch requests to our lambda. Lambda will parse form data and verify whether the task and settings are valid or not. After the verification phase, lambda will go through repositories to find future branches to build. Once found, lambda will create additional AWS CodeBuild and AWS CodePipeline projects via Terraform by pushing stubs to the pipeline.

The Lord Of The Pipelines

The Lord Of The Pipelines is an AWS CodePipeline project that listens to our terraform repository. Once we receive the relevant commit for feature branch building, the pipeline will apply changes to our QA environment to create a docker build pipeline. The pipeline consists of only two stages; source and docker build phase.

We decided to use AWS CodeStar Connection for not exposing credentials while using terraform to apply for the source stage.

Docker Builds

As you can see everything has started like it is a fairytale, but then it turned into my nightmare that brings me sleepless nights. Some of our flagship projects are using Laravel with frontend capabilities. Others use several languages and frameworks, including but not limited to Go, Node.js, Lumen, Ruby. Since PHP does not offer a web server, we need Nginx to serve our assets. Also, we need to leverage PHP-FPM.

We choose to use multistage docker builds for better layer caching. We had to rewrite existing docker files for AWS EKS. We had switched to using a golden image for our PHP builds. Yet, at this stage, we moved forward to using multistage docker as well as embedding Nginx into our containers to serve our application via an external ingress load balancer.

Also, since Nginx is told to forward PHP requests onto the PHP-FPM process, we need to create an entry point script that will look after both Nginx and PHP-FPM processes.

We can skip adding Nginx onto other containers if they are not serving static files, just passing the traffic to our cluster.

Helm

Along with docker builds, we need to create helm charts for our distributions for altering some variables into different namespaces. That’s why we have created helm charts for our applications’ needs. It gives us the ability to handle dependencies based on project needs. Unfortunately, we are able to make only one data container (MySQL, Redis, Nginx, NSQ, Clickhouse) to serve all containers in legacy infrastructure which caused some problems since it is not a production-like distribution.

CodeBuild

Insider needs to run preprod tests across all environments by design before each deployment. Once the development branch is ready, we need to deploy newly created versions onto our Kubernetes cluster.

After altering CodeBuild to build docker images and push them to ECR, we need to deploy to EKS via Helm. To deploy the application on AWS EKS, we need CodeBuild instances to assume a role defined in the cluster’s aws-auth ConfigMap. Once we have verified connectivity, CodeBuild can run Helm deploy across selected namespace.

Namespace Design In EKS

We’ve mentioned preprod and feature branch tests before, which brings us to our next topic, designing namespace in EKS.

Since our feature branches originated from develop branch, the default namespace carries all projects in develop branch. Once checks and tests are passed, feature branch will be merged into develop branch. Merging to develop branch will trigger CodePipeline to build a docker image and deploy it onto the EKS’ default namespace.

The build phase remains the same for future branches. We only deploy it into a new namespace named by the feature branch’s id.

Cross Namespace Resource Sharing

After all deployments are done, pods will determine which assets should be used for various calls. Environment variables in Helm files let us reroute specific calls across the current namespace or the default namespace. This way, we can reduce project-specific dependencies if the dependency does not include any code changes. Less dependency means less compute power, fewer pods and less cost.

Ingress Controller

Since there are customer portals and web projects, QA Engineers need to access newly created application versions across all namespaces. For the default namespace, the ingress endpoints will remain the same, while for future branches, we need to rewrite endpoints to routing proper pods in the selected namespace. Nginx ingress controller lets us rewrite and reroute traffic for certain pods and namespaces.

Cluster Role-Based Access Control

Great power comes with great responsibility.

Having an active cluster, we need to set ground rules for our developers or QA engineers. The vision bot provides an AWS user with CLI access enabled for a short period.

Newly created users are only allowed to do some actions in the selected namespace.

Namespace Lifecycle

In the slack prompt, the user asked to enter a duration for the testing. If the given time exceeds the default testing time, The Vision Slack Bot will forward the request to Senior QA Engineers for review. When action is authorized, the bot will create the namespace with that given time frame and a temporary AWS user.

A scheduler will look for active namespaces and check the duration. If there’s one hour left, the bot will notify the task owner. If the task owner needs additional time, task owner will send an action response to the bot asking for more time.

Once the task reaches its end of life, the bot will destroy that namespace and the created temporary user.

Final Words

Final Workflow Diagram

We needed to have at least two hours, ~120 minutes, to prepare the testing machine in legacy infrastructure. However, with the new system enabled, we’ve managed to reduce the time to ~20 minutes. Using fewer dependency pods lets us cut our costs as well. In addition, granular defined access controls enabled a more secure testing environment. Finally, Ditching Jenkins operations helped us to have flawless pipelines.

CodeBuild Phase Durations

We’ll be looking at cost management in the next article.

To be continued…

--

--