In its simplest form, a deployment can be explained as a sequence of steps which are required to be performed in order, to enable application software to run on a specialised computer system and fulfil its use-case of serving user requests.
These steps may involve installing dependency software, applying configuration, setting up the logging and monitoring tools, performing performance optimizations, setting up required resources like databases and caches, and so on. Performing these processes manually can take a lot of your time.
In the traditional pattern of deploying software, deployments or releases were special events and happened occasionally. Those were the moments when months of hard work by large groups of developers and testers would get combined to form an application package. This package would then get deployed manually on computer systems to serve user traffic. These events would need extensive planning, impeccable skills, astronomical time and, sometimes a bit of luck to go through without a glitch.
To put this in perspective, think about an administrator logging into every system that exists and executing one command after another to install new software, upload new code and test for correctness and stability.
With agile development practices gaining widespread usage and development teams adopting the 12 Principles, developers wanted to ship software incrementally and iteratively. This meant deployments had to happen faster and the process had to be automated.
Interlude: Continuous Integration and Deployment
Continuous Integration is the process of continuously merging/integrating code written by different developer groups and validating the correctness by running automated tests on the entire code base. These integrations can happen multiple times a day — usually when developers push their changes to version control repositories.
A logical extension to this process is to automatically deploy that code to production systems. This can be done right after the build system goes green, indicating that all tests have passed.
When we say “CD”, it can be interpreted to mean either Continuous Delivery or Continuous Deployment interchangeably. But, there are prominent differences between the two processes and people often confuse between these terms. Let us disambiguate.
Continuous Delivery is a process which when followed establishes the proof that every change is deployable at any moment. But the actual deployment in most cases is manually triggered.
Extending this, Continuous Deployment is the actual automation of the final step of continuous delivery where when the change is deployable, it automatically gets deployed without any manual intervention.
For an interesting discussion on this topic, have a look at this article on Continuous Delivery vs Continuous Deployment.
Postman’s Approach to Continuous Deployment
Over the past year, the engineering team at Postman has been charting a comprehensive strategy to adopt continuous deployment practices in our development process. Through the rest of this article, I’ll talk about how we approach continuous deployment and how we use some of the features of the Postman platform itself to automate these processes.
Adopting git-flow organization-wide
Postman as an organization adopted Git for version control for all its code across all its public and private code bases from the company’s initial days. This gave us visibility into code changes over time and a framework for straightforward collaboration among multiple contributors.
We chose to use the git-flow branching model to make this process better. The
feature branch workflow that git-flow brings to the table lets us do pull request based merging, isolated experimentation and greater stability during integration.
At any point in the development cycle, the
master branch always contains the code that has to be released to the users. All the latest working, unreleased code is present in the
develop branch. All other work happens in branches called
feature branches, which are eventually merged to
develop when their tests pass and reviewers are happy with the correctness and quality of the code.
Tests running on Bitbucket pipelines
This git-flow based workflow formed the basis of a continuous integration pipeline where the
develop branches can be treated as production-ready code. To assist and enforce the criteria that all tests must pass, we rely heavily on Bitbucket’s integrated CI/CD pipelines.
This pipeline helps in defining clear and distinct steps for:
- checking vulnerabilities in dependencies,
- running system, integration and units tests,
- providing a platform for packaging the application for deployment.
Packaging here involves steps to create a
zip archive containing the source code and a
Dockerfile containing steps to build an image.
AWS Elastic Beanstalk to manage deployment environments
At Postman, we rely extensively on AWS Elastic Beanstalk to deploy our code in production. This service provides the right balance between ease-of-use for regular usage and hackability for advanced usage. We can deploy code to Beanstalk in a few ways, like by uploading the source code directly as a
zip file, or as a
Dockerfile along with the source code. Beanstalk builds an image from the
Dockerfile and then deploys the code as a Docker container with the right entry point to the code base.
The main problem with this approach for deployment is the manual work and the time it takes to do the deployment. The entire source code has to be archived properly and uploaded manually in the Beanstalk console. Once the deployment starts, Beanstalk builds the Docker image from scratch, which takes additional time. As the service starts to scale, this image building process is repeated again and again, which also increases the time required to scale-up to handle the excess load.
This method also depends on external services like npm or Docker Hub during the build process. If any of those external services are down when the images have to built, it can cause micro downtimes resulting in degradation of service health.
ECR and prebuilt docker images to the rescue
We can treat every deployment as an amalgam of a docker image and its associated immutable configuration.
The other method Beanstalk provides to deploy code uses a prebuilt Docker image that Beanstalk can use directly without requiring to build anything. This results in lightening fast deployments and scale-ups during peak loads. The only catch here is that we need to build and store the images somewhere else. That is where use use Bitbucket pipelines.
We use the Docker-as-a-service feature provided by pipelines to create Docker images from the
Dockerfile. We push these images to the Elastic Container Registry (ECR). ECR is another service provided by AWS that can be used as a private container image registry. The benefit of ECR is that Beanstalk environments can directly pull the images from the registry, without exposing any data to the outside world by applying just a few IAM policies.
Concept of a deployment (code + config)
But this doesn’t completely encompass the notion of continuous deployment. We still need to create a small package containing just the
Dockerrun.aws.json file and upload it to Beanstalk to create a
version and trigger a deployment. The
Dockerrun.aws.json contains the image directive that instructs Beanstalk to pull images from ECR and run a container for them. Beanstalk is also capable of reacting to configuration changes. Every configuration change can trigger a fresh deployment.
Getting inspired from this behaviour, we can treat a deployment as an amalgam of a Docker image and its configuration. The combination of these two elements can be considered as an immutable artefact that can be deployed independently and rolled backed with ease in case of any failure.
Deployment as a closed-loop control system
Now that we have established what a deployment is and that
master branch should contain the deployed code, we can go ahead to understand how we can do this automatically.
As you can see in the diagram above, once changes get merged to
master, we perform unit tests and integration tests for one last time before release. Also because we use
git-flow, we can automatically generate changelogs from the different features that were added or updated by using the Pull Request data. Once this step is complete, we use the pipelines to build an image from the source code using the
Dockerfile present in every repository.
Before we go ahead from here, we need to understand the concept of a Service Catalog. A service catalog is a listing of repositories, their associated ECR repositories, Beanstalk Environments, Route53 Mapping, contributor lists, etc. It is a service that acts as a source of truth for all the mapping between different components and resources across the infrastructure.
After we create the Docker image, we query the service catalog to find the associated Registry on ERC. Then, we tag the image and push it to the Registry. The tag here is a unique phrase that identifies every image and denotes from which branch it was created from.
Following this, we deploy the image to a staging environment to run contract and security tests against this release candidate.
Once this is through, we move the release candidate image to production, to be able to achieve continuous deployment in the truest sense. This is where control systems come into play with the implementation of a release gateway. A release gateway is a series of requests backed by tests that decide if the image is eligible for production. Think of it as an API-based checklist on steroids. If we look at SRE concepts like Error Budgets, we can easily expose them with an API and consume them during the deployment to compute the eligibility.
To keep this closed loop control system going automatically, Postman Collections and Postman Monitors play a crucial role.
Automation with Collections in the API First World
Everything is just an API call away
In the API-First world, every component in a system exposes an API of some sorts and is intended to be used in that way. If we imagine every data source to be an API, then all we need is a flow that will collect data from all the different sources and transform it as it goes through the flow using some glue code. This thought process in the context of Postman makes a lot of sense and gives us the ability to perform continuous deployment using Collections, which are a series of API request definitions with pre and post request scripts. They can be treated as a primitive block of compute, taking inspiration from Lambda Calculus.
Collections can help define workflow with help of pre-request and post-request scripts.
In all the usual cases, a collection has a list of requests and their associated tests for validating the correctness of a request. If we change that perspective and start considering every component in a request — the pre-request script, request and test script — as a single unit of compute or a lambda, then we can do a lot of automation with the collection runner.
To help us define branching and control-flow across requests, the Postman sandbox exposes an interesting API named
postman.setNextRequest(). This lets us conditionally change the order of execution of requests, hence giving the ability to perform branching operations like looping and terminating.
Further, a group of API requests that perform distinct operations can be grouped together in folders and assigned a relevant state with variables. Environment variables can also be used a temporary memory to store data that need to be shared across different requests.
What we see here are basic components of a primitive computer:
- request — Unit of Logic
setNextRequest— Branching directive
- environment — Memory
We can go ahead with our primitive computer and define the flow of statements that would take the Docker image and
zip artefact created in the previous stage and automatically deploy it to Beanstalk using AWS APIs.
Let’s take a deeper dive into the actual implementation we used at Postman.
AWS Lambda + Newman
Continuing from the previous stage of building a Docker image and deployment artefact in Bitbucket pipelines, we trigger a run of our deployment collection.
The important question here is where do we run this collection and how do we trigger the run?
Here we again use another AWS service called Lambda Functions to run Newman. Newman is a command-line collection runner from Postman.
We configure the Lambda Function to either run when triggered by API Gateway or run on a schedule. In this case, we send a request to the API Gateway from Bitbucket pipeline to trigger a collection run through Lambda, essentially converting our collection runs to an API.
In the collection we use, the first step is to get Project details. To do that, we call the Bitbucket API to fetch information about the origin of this deployment, which includes details like branch name, commit id, contributor name, etc. We use this data to also generate changelog for the release.
Following that, we call Service Catalog APIs to get details about the Beanstalk application and Route53 entries associated with the service that triggered the deployment. This gives us the basic mapping of code to infrastructure. This allows us to call specific AWS APIs to figure out the deployment target.
We use the application name from the previous step to query the Beanstalk API for all the environments associated with the application. This is the list of candidate environments that can accept the deployment. Then, to select the exact environment, we call the Route52 API to get which environments are healthy and serving traffic.
After getting this set of data, we run the pre-deployment checks to compute the qualification of the release candidate for deployment. This step also acts as a release gateway when running on schedule, which involves calling other internal API’s to establish if the deployment is allowed to go through or not.
Finally, after all the checks are done, we select the target staging environment from this list of candidate environments and create a deployment using the Beanstalk API.
Promotion to production
Once the image reaches staging, we can push it to production once it clears checks set by the release gateway. These include:
- Security: The moment deployment to staging succeeds, the security sanity check collections created by Postman’s security team are triggered to validate for one last time that no security vulnerabilities exist. When these collections run, they generate security metrics that are piped into our analytics system. The internal task management system is also queried to check no issues are pending completion. Once all these checks are complete, security marks a green on the gateway checklist.
- Infrastructure: In parallel to that, all infrastructure configurations are verified according to the platform specification. This ensures that all platform components set up to monitor the production systems operate within nominal limits post-deployment.
- Contract: In conjunction to these, there are different contract collections defined by the consumers for the application. All these collections are executed to ensure that the upcoming release doesn’t break any interface contract that it has established with its consumers.
- Maturity Model Check: At Postman, every service or application has a maturity model that defines its characteristic performance. This maturity model is computed by processing historical data obtained from Application Process Monitors (APMs), logs, Postman Monitors, and User Journey Mapping tools. The maturity model computes a score for every service, and from this score, the platform decides factor like deployment frequency. So, the analytics systems are queried to obtain the score from the maturity model and if it is within the approved range, it gets another green on the gateway checklist.
- Load Test: Finally, the platform load-testers are automatically fired up and the service is tested for factors like sustained loads, burst traffic, etc. which completes the last item in the checklist.
Once all the blocking items in the checklist are green, we can push the image to production. To promote a staging deployment to production, we run a Postman Monitor using the same collection. This runs on a regular cadence to evaluate release gateway conditions. Once everything is okay, we go ahead with the deployment to production.
In the end, once the deployment goes through, we perform availability checks, performance checks and reliability checks continuously using Postman Monitors, AWS health checks and Cloudwatch metrics. All this data is piped back to the maturity model to form the closed-loop control system of deployment that we have been talking about.
One of the core philosophies at Postman is to use Postman to build Postman as much as possible. With this workflow in place, we will focus on making our deployment systems more stable and robust in the future by enhancing all Postman product components.