How to make your development environment more reliable

By: Shlomi Ratsabbi and David Gang

Lightricks
Lightricks Tech Blog
7 min readApr 27, 2023

--

Challenges of releasing software to production

The process of releasing software to production is always a daunting task. The main challenges of pushing changes from a local environment to production are:

  • Configuration: The configuration in local environments often does not match the production configuration.
  • Data: The local environment does not contain data similar to production.
  • Permissions: In local environments, the permissions of service accounts are normally not an issue, as hosted solutions are used, but they break in production.
  • Interaction with other microservices: Many times the local development is just a part of a greater software system consisting of multiple microservices. Sometimes multiple microservices need to be changed together. This makes it hard to test in the local machine.

David Farley, Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation : “Releasing software is too often an art; it should be an engineering discipline.”

Development environment

The naive solution to this problem is to create a development environment which is shared by all developers. While solving part of the problem, it creates many other problems.

  • Testing multiple developments in parallel: As the environment is shared, it is reasonable that multiple developments are tested in parallel. This makes it hard to track back why a test failed.
  • Changes in infrastructure may conflict with the work of others: For example, a change in a database schema could break the changes of other people.
  • Divergence: It is easy to roll back code but it is not as easy to roll back changes in third party providers. So changes in cloud resources, like database schemas and service account permissions, which don’t land in production lead to a divergence between the development environment and production.

The unknowns inside a persistent environment

It’s clear that persistent environments don’t address many of the above issues. Deploying and testing in a development environment is challenging, yet even when it is accomplished, there is still uncertainty about production due to the inevitable differences between the environments. All of this makes it difficult to fully test every single feature, bug fix, or security enhancement.

And those persistent environments are incrementally configured and updated, piling up redundant resources and unwieldy configurations, even when infrastructure-as-code and GitOps mechanisms are supposedly in place. These mechanisms aim to have cloud infrastructure and application configuration managed by a version control system, but when using them for building the whole infrastructure and/or applications from scratch, one is often confronted with compatibility issues, security limitations, missing documentation, and so on.

What do branch environments bring to the table?

The correct solution would be that every development gets its own environment.

At a high level, every application consists of the software which is developed by the team plus some third party software, which is hosted in cloud providers, for example PostgreSQL or message brokers.

To get an efficient and reliable environment for development and some tests, the application should be installed in an isolated environment, dedicated for the feature being developed in a branch of some version-control-system. This VCS is GitHub in our case, and the deployment of such an environment is initiated when a GitHub pull request is created from a branch whose name matches a pattern or convention. An infrastructure-as-code tool may be used to configure cloud resources, ending up with a customized Argo CD application, according to the naming convention that distinguishes branch-envs from each other.

Once deployed by Argo CD, some infrastructural applications (like external-dns and cert-manager) in the cluster detect the new Kubernetes resources and finish the job, providing the developer with an isolated application, based on their GitHub branch. This application gets a unique URL with TLS certificate, against which tests are launched, and developers can now gain confidence in their new code independently of the others.

These branch environments are ephemeral. Once a pull request is closed, or after a predetermined period of time, the dedicated environment is destroyed. On the other hand, some of the resources are persistent and shared across branch environments, for cost efficiency and quick creation or destruction, as long as each environment stays isolated. The goal is to have a solid, fast, and cheap way to independently test your own work, from a small fix or feature, to a large and complex upgrade.

A great side effect of this approach is that it can not be achieved if you are not familiar enough with the application and all of its dependencies to document it in a version control system!

A schematic diagram illustrating branch environments:

Below is the relevant part of an Argo CD application manifest, in the HCL format of kubernetes manifest resource. The Helm chart deployed is an umbrella one, and the tags of its dependencies are overwritten, along with cloud resources and other values:

resource "kubernetes_manifest" "github-branch-app" {
manifest = {
"apiVersion" = "argoproj.io/v1alpha1"
"kind" = "Application"
"spec" = {
"destination" = {
"namespace" = var.branch_prefix
"server" = "https://kubernetes.default.svc"
}
"source" = {
"helm" = {
"parameters" = [for k, v in merge(
{
"configmap.topic-foo" = module.pubsub_topic["foo"].name
"configmap.database-bar" = module.psql_db["bar"].name
...
},
{
for k in toset(concat(
var.microservices_list, [var.microservice_name]
)) : "${k}.image.tag" => var.branch_prefix
},
var.parameters_override,
) : { "name"=k, "value"=v }]

Environments do differ from each other

The most obvious difference between the environments is the load that production systems are expected to carry, which typically results in a more distributed design than development environments. Branch environments are very compact by design, and shouldn’t mimic production loads. Therefore, actions like stress tests should be taken in a persistent staging environment, where all the recent updates are tested before being pushed to production.

Another difference between the environments is secrets, for which there are many solutions. Of course, the Helm chart or Kustomize configuration should include the manifest for extracting sensitive data from its secure storage.

But when it comes to data and permissions, the branch environment shines. Whether it’s a single- or multi-cloud ecosystem, everything is created from scratch, using IaC and GitOps, which in turn invoke other tools, like database migration or Elastic index templates. The very same tools apply for the consistent environments, ensuring identical cloud configuration and data structure across environments.

Implementation without keys

Applications in a branch environment use Cloud IAM service accounts, which is the recommended way for authenticating Kubernetes service accounts. They are given the most granular permissions, according to security best practices. Here, it also catches updates using new cloud resources, enforcing permissions adjustments right at the branch-environment stage. The same config will be then copied to the persistent environments, avoiding unpleasant surprises.

Just like Kubernetes service accounts can impersonate IAM service accounts without keys, GitHub Action, that we use for CI workflows, authenticates to Google Cloud as a workload identity provider, which can be configured by the GitHub OIDC module.

Terraform, one of the most powerful infrastructure-as-code tools, keeps track of its managed resources by a state that it stores in some “backend”, which cannot be local when teamwork is desired, not to mention the temporary machines CI systems use. With the permissions granted to the Google Service Account that is linked with GitHub identity provider, Terraform is able to fetch its state from the backend in GCP, and manage all the configured resources. All this without any password or key.

Once an Argo CD application manifest is created, control is transferred to Argo CD, which in turn installs a Helm chart. The sub-charts (Helm packages defined as dependencies) require access to a Helm repository, a private Google Artifact Registry in our case. GKE nodes are responsible for pulling the packages, along with application images, and they are permitted to do so by the artifactregistry.reader role their GSA has been granted. Again, not a password or key to store, rotate, or risk compromising.

When CI pipelines need to make API calls to the Argo CD server, we want to avoid user/password or interactive login that this API server requires. Being authenticated and authorized inside GKE, the --core option comes into play:

# setup gcloud
# download and cache argocd cli
gcloud container clusters get-credentials <cluster> --region=<region>
kubectl config set-context --current --namespace=<argocd-namespace>
./argocd <command> [parameters] --core

Conclusion

In this blog we’ve seen serious drawbacks to developing and testing new features in shared environments, and how to design a much better workflow which creates branch environments upon request.

Three powerful tools — Terraform, Argo CD, and Helm — are combined to produce a flexible configuration for these branch environments, with everything needed for testing on one hand, and having full isolation without cost penalty on the other hand.
Designing such a workflow requires familiarity with every component and dependency of the application, and having the up-to-date configuration documented in a version control system. Isn’t this the exact objective of IaC and GitOps?

Create magic with us
We’re always on the lookout for promising new talent. If you’re excited about developing groundbreaking new tools for creators, we want to hear from you. From writing code to researching new features, you’ll be surrounded by a supportive team who lives and breathes technology.
Sounds like you? Apply here.

--

--

Lightricks
Lightricks Tech Blog

Learn more about how Lightricks is pushing the limits of technology to bridge the gap between imagination and creation.