Upstream Infrastructure Changes Methodology

Published in

Israeli Tech Radar

6 min readFeb 8, 2023

How infrastructure and infrastructure-related application changes are upstreamed all the way to production?
Upstream infrastructure changes methodology is a systematic approach to making changes to the underlying infrastructure of an organization. This methodology is designed to minimize the risk of failure and ensure that the changes are made in a controlled and predictable manner.

You are going to learn how infrastructure changes are managed in multiple environments. Changes examples: upgrades or deployment of external-secret, external-DNS, and load-balancer controllers.
In addition infrastructure changes such as EKS / AKS /GKE upgrades, adding Node Groups, setting load balancers, or creating Database resources …

Please note, I am not going to talk here about any solution applicative changes processes like any CI/CD you may know.

The following is a high-level overview of the steps involved in this methodology.

Planning: The first step in this methodology is to plan the changes that need to be made to the infrastructure.
Assessment: the current state of the infrastructure is assessed.
Design: a design for the changes is created which outputs a detailed implementation plan (a.k.a LLD)
Implementation: Making changes to the physical infrastructure.
Testing: Running automated tests or manually testing the infrastructure.
Deployment: Upgrading the application gradually or at once depending on the risk.
Monitoring: monitoring systems to quickly detect any issues that arise.

By following this methodology, organizations can ensure that changes to their infrastructure are controlled and predictable, minimizing the risk of failure and ensuring that their infrastructure remains reliable and functional.

The opposite of forgetting is writing

In order to control the change completely, we must implement an infrastructure management system according to Infrastructure As A Code practice. The concept is once we succeed to document all our resources, we can manage these resources.

You may know tools that dominate this area such as Terraform, Pulumi, AWS CloudFormation, Azure ARM Templates, Google Deployment Manager, CDKTF, AWS CDK

Once infrastructure modification is required, we usually encounter 2 cases:

Minor changes - may be done with minimum risk to the environment and service availability. These changes usually are done serially. A single change each time.
A quick rollback option must be available at all times, this case includes, for example, a situation of adding a new resource with no impact or just upscaling the system.
When minor applicative changes are needed, you may use a Progress Delivery tool for example Argo Rollouts /Flagger. Another practice is to manipulate versions in GIT if a GitOps system is deployed.
Major changes - include breaking changes that may cause service unavailability or a broken environment. These changes usually are done as a whole in a sibling environment. Testing and validating the sibling environment functionalities is a must. From now until the next major change, the newly upgraded environment is in use.
The Blue-Green Deployment method is used in this case to minimize the risk.

Blue-Green Provisioning

Blue-Green Deployment is a software deployment strategy that reduces downtime and risk by running two identical production environments, “code-named” Blue and Green.

Only one environment, Blue or Green, is active at a given time, while the other is idle. During a deployment, traffic is shifted from the Blue environment to the Green environment, allowing the deployment to be validated in the Green environment before it is changed to the active one.

Production environments are critical environments. We provision these environments according to the Blue-Green deployment methodology. First, an advanced sibling environment is created next to the old one. Checks and validation are completed. Traffic is routed to the new environment. Final infrastructure and applicative tests are done. Finally, the old environment is deleted.

Benefits:

Improved reliability: Blue-Green Deployments minimize the risk of downtime or service disruption by ensuring that there is always a live environment to serve traffic.
Zero downtime (when done well): Prepare the upgrade beforehand and switch at the end
Quick rollback: If a problem is detected with the Green environment after deployment, traffic can be quickly redirected back to the Blue environment.
Easy to test/reproduce: The Green environment can be used for testing before deployment, reducing the risk of issues being discovered after deployment.
Tranquil way of doing

To implement Blue-Green Deployments, you will need to have two separate environments with identical configurations, and traffic management to reroute between them. It can be implemented by a single load-balancer rerouting to the new environment, or DNS manipulating to the load-balancer that is related to the new environment.

When a new release is ready to be deployed, it is first installed on the Green environment. Initial validation and tests are done. Traffic is redirected to the Green environment.

The Blue environment is now the rollback option.

If there are any issues with the Green environment, traffic can be redirected back to the Blue environment, allowing you to resolve the issue before it impacts your users significantly.

In conclusion, the Blue-Green Deployment method is a reliable and efficient way to deploy software updates, reducing downtime and risk. By running two identical production environments and carefully managing traffic between them, you can ensure a seamless deployment process for your users.

Multi Environment Change Management

Every organization finally asks to manage infrastructure. Some related basic questions are aroused:

How do we manage the changes in all environments?
Can we save a state for each environment?
Is there a common code for all environments?
Is it enough just to save the configuration/state for each environment?
What is the single source of truth?

If we wish to simplify the discussion, let’s assume a change includes always a change to the configuration and to the common code.

In order to understand these questions thoroughly, let's try to imagine a case where we manage 3 environments: dev, staging, and prod. Although the configuration of these environments is kept in separate files, there is a common code for all environments. A change is done to the dev environment. Testing and validation are done also. Now, these changes affect all environments at Once. This is not Good. We must gradually push the changes.
A pull/merge request are tools that enable reviewing and pushing code changes from one branch to another.
If the code is managed only in a single Git branch, no way to pull requests and review these changes. We must manage the code at least in 2 separate Git branches.
A more detailed example is dev, staging, and prod. Now we provision the dev environment using the dev branch. After provisioning, testing, and reviewing are completed, try to upstream the changes to the staging branch. The same process should be done again in the stage and prod environments.

As an example, the stage environment is described in the Git branch — staging, and the production environment is described in the Git branch — main.

dev(dev branch) → staging(staging branch) → prod (main branch)

The source of truth is located in the relevant branch according to the environment. Our responsibility is to keep upstreaming abidingly.

Finally, I can summarize :

Code your Infrastructure using the Infrastructure as Code method.
Enjoy the Blue-Green Provision method in case of Major changes.
Manage your environments using code branching

These will help you to Upstream your Infrastructure changes

Upstream Infrastructure Changes Methodology

The opposite of forgetting is writing

Blue-Green Provisioning

Multi Environment Change Management

Written by Amir Misgav