5x Faster Staging Environment Structure with Kubernetes

Nedim YILMAZ
Insider Engineering
7 min readFeb 20, 2023

As developers, we are constantly looking for ways to improve our workflows and make our processes more efficient. One area that we have identified for improvement is our staging environment, which is used for testing and deploying features before they go live.

Current Tech Stack and Structure

Our current staging environment setup involves using GitHub, Jenkins, AWS, and Jira. This setup involves creating a branch for a JIRA task, opening a pull request on GitHub, and then using Jenkins pipelines to create a machine on AWS and deploy all necessary applications for testing on that machine. It takes almost 90 minutes to be ready.

However, we have identified several issues with this current setup. One problem is that our staging environment relies on a single machine, which can lead to resource contention and slow testing and deployment times. Additionally, managing and deploying all of our applications on a single machine can be complex and time-consuming.

Proposed Tech Stack and Structure

To improve our staging environment, we have proposed a new tech stack that includes the use of Kubernetes, a platform for container orchestration, scaling, and management of containerized applications. With Kubernetes, we can create a more flexible and scalable staging environment by deploying our applications across multiple machines, allowing us to better utilize our resources and speed up testing and deployment times.

In addition to Kubernetes, we will also be using a range of tools to streamline our staging process and automate tasks. These tools include AWS Codebuild, a fully managed continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy; AWS Lambda, a serverless computing platform that runs code in response to events and automatically manages the underlying infrastructure; Terraform, a tool for building, changing, and versioning infrastructure safely and efficiently; Helm Charts, a package manager for Kubernetes that makes it easy to deploy, upgrade, and manage applications on Kubernetes; AWS EKS, a managed Kubernetes service that makes it easy to deploy and run applications on Kubernetes; AWS ECR, a fully managed container registry that makes it easy to store, manage, and deploy Docker container images; Slack, a communication platform for teams; Requestly, a tool that allows users to modify HTTP requests and responses to test, debug, and demo applications; Mitmproxy, an open source interactive HTTPS proxy that help to modify your requests from client; and Traefik Ingress, a Kubernetes ingress controller that routes traffic to the correct service within a cluster.

To manage deployments for feature branch tests, we will also be using an in-house tool called the Vision Bot. This tool is triggered via Slack and allows developers to request a staging environment by providing their JIRA task ID and desired environment duration. The Vision Bot retrieves a list of necessary projects from JIRA and checks for the existence of feature branches in the corresponding repositories on GitHub. If a feature branch exists, the Vision Bot generates Terraform files to create an AWS Codebuild that will build the project from the feature branch. If no feature branch exists, the bot directly deploys the project to Kubernetes from the develop branch of the repository.

To allow for the concurrent testing and deployment of multiple features, we are using namespaces on Kubernetes to separate the different tasks. Projects built from the develop branch are deployed to the default namespace, while projects built from feature branches are deployed to their own namespaces named with the JIRA task ID. This allows us to test and deploy multiple features at the same time without interference.

To facilitate internal requests between projects within the same namespace, we are using DNS configurations that allow projects to look for dependencies within their own namespace before reaching out to the default namespace.

To facilitate external requests to the staging environment, we are using the Traefik Ingress to route requests to the correct namespace. The developer can use the Requestly tool to add a special namespace header to their requests, which will be matched by the Traefik Ingress and routed to the namespaced version of the desired project.

Overall, this new setup allows us to more efficiently and effectively test and deploy features, as we can build and deploy only the necessary projects rather than all projects on a single machine. It also allows us to test multiple features concurrently without interference, thanks to the use of namespaces and DNS configurations.

Example

JIRA Task and GitHub Projects’ Dependencies

To illustrate how this new setup would work in practice, let’s consider an example with five projects: project_1, project_2, project_3, project_4, and project_5. (project_1 depends on project_2 and project_3, project_3 depends on project_4 and project_5).

Imagine that you are a developer working on a JIRA task with the task ID of jira-12345. You have been assigned to work on projects project_1 and project_2, which both depend on project_3. You create feature branches for project_1 and project_2 and open pull requests on GitHub and add these projects to the project list on JIRA.

Vision getting requests from Slack and looking at the data on JIRA and GitHub

To test your changes on the staging environment, you use the Vision Bot to request a staging environment via Slack. The bot retrieves the list of necessary projects from JIRA and checks for the existence of feature branches on GitHub. It finds that feature branches exist for project_1 and project_2, but no feature branch exists for project_3.

The Vision Bot generates Terraform files to create AWS Codebuilds for project_1 and project_2 and sends these files to the codebuild-files repository. Another in-house tool picks up these files and creates the Codebuilds, which are triggered by the feature branches for project_1 and project_2. These Codebuild pipelines build the projects simultaneously from the feature branches and deploy them to the jira-12345 namespace on Kubernetes. This means that the build time only takes as long as the longest build. It takes only 20 minutes maximum for Insider. Project_3 is also deployed to the jira-12345 namespace, but it is deployed as an image from AWS ECR rather than being built by a Codebuild.

To test your changes, you can use the Requestly tool to add a special namespace header to your requests that will be routed to the jira-12345 namespace by the Traefik Ingress. This allows you to access project_1 and project_2 in the jira-12345 namespace while still using the same domain, project_1.example.com, and project_2.example.com, as you would access the default namespace.

To access the wanted namespaces, Traefik Ingress Routes has header conditions to control the requested namespace.

Traefik Ingress example to listen to routes by header

To ensure that project_3 can access its dependencies (project_4 and project_5) in the jira-12345 namespace, we have added dnsConfig for all projects in the form of default.svc.cluster.local. This allows all projects to first look for dependencies within their own namespace for internal requests. If a needed service is not found in the namespace, the request is sent to the same service in the default namespace using the dnsConfig.

Deployment Example with dnsConfig to connect services in default namespace as the default service

This new staging environment setup allows us to efficiently test and deploy features, as we can build and deploy only the necessary projects rather than all projects on a single machine and reduces the build time by almost 5 times. It also allows us to test multiple features concurrently without interference, thanks to the use of namespaces and DNS configurations.

Conclusion

In summary, we have identified issues with our current staging environment setup, which relies on a single machine and can be complex to manage. To address these issues, we are using a new tech stack and in-house tools, including Kubernetes and the Vision Bot, that will allow us to create a more flexible and scalable staging environment. This new setup will allow us to deploy and test multiple features concurrently, while also streamlining our testing and deployment processes. We believe that this will lead to faster and more efficient feature development and deployment, ultimately improving the stability and reliability of our systems for our users.

Contact

If you have any questions or comments about this story, please don’t hesitate to reach out to me. You can contact me through the following channels:

LinkedIn: nedimyilmaz

I am always open to hearing feedback and discussing ways to improve our processes and systems. Thank you for reading!

--

--