Deliver It!

Part 2 - Patterns of Effective CI/CD

17 min readAug 3, 2024

The first step to mastering any tool or technology is to understand the problem it is designed to solve. In Part 1 of this series, we examined several problems that frequently arise when scaling a software-oriented business. If left unaddressed, these issues impede the value that a business produces and may even lead to harm or loss for end users. One only needs to look at the recent CrowdStrike disaster to find evidence of this. Continuous Integration and Continuous Delivery are intended to prevent such problems before they reach production, while also increasing the speed and efficiency of the development process.

Part 2 of this series examines the structure of a CI/CD pipeline and walks through an example to understand the responsibilities of each stage. Taken together, the entire pipeline provides an automated, efficient, and reliable means of delivering high-quality software.

The whole is the sum of its parts

While Continuous Integration (CI) and Continuous Delivery (CD) are designed to solve different problems, they provide the greatest value when used together. A CI/CD pipeline is a system that progresses the software through both processes in a highly automated fashion.

The job of CI is to merge parallel changes to the codebase to create a valid, authoritative version of the source code. Modern version control systems such as Git help with this by exposing the most obvious integration issues. For example, Git will prevent a merge if two developers edit the same line of code in different branches, or a file is deleted in one branch while being edited in another.

Although version control is great at catching these kinds of issues, it isn’t sufficient by itself. Even if the code is successfully merged into main, the code may still fail to compile or have bugs that are only detectable at runtime. CI surfaces these problems by building the source code and then testing it against a fast-running suite of regression tests. In this way, CI provides substantial feedback within a few minutes of changes being integrated.

After CI determines that the source code works in isolation, CD verifies that it works with the rest of the system. First, the updated component is deployed together with other parts of the system, and the entire assembly is promoted through increasingly rigorous tests. These tests try to prove that the entire system satisfies its business-facing requirements (which are known as acceptance criteria). Once the assembly is believed to be valid, CD ensures that the new version is rolled out consistently and reliably. It also provides mechanisms to revert to the previous version if anything goes wrong.

Implementations of CI/CD break out into three distinct stages. The objectives of CI are realized during the first stage — the commit stage — named thus because it occurs upon every commit to main. The objectives of CD are met by two stages often referred to as acceptance and release; names that aptly represent their purpose.

Fail fast, release confidently

The sequence of the stages along with the partitioning of responsibilities among them is significant. We said in Part 1 that CI/CD aims to bring pain forward in the delivery process where it is cheap to deal with. Let’s take a moment to expand on this.

A mature software delivery process includes many different forms of analysis and testing. Some of these tests are small in scope. This makes them fast to execute and simple to maintain, yet they also provide less confidence that the app works correctly as a whole. Other, more broadly-scoped tests yield greater confidence at the cost of being more complex, slow, and expensive. CI/CD is typically organized such that fast, cheap, narrowly-scoped tests are conducted at the beginning, and longer, more expensive tests occur later on.

This pattern has two benefits. First, it increases the odds that problems will be discovered quickly. The commit stage may be completed within 3–5 minutes of changes being checked in, enabling developers to become aware of a problem, fix it, and try again as quickly as possible. The second benefit is that the business saves money by avoiding expensive tests when they are unnecessary. When a cheap test reveals a problem, there is no need to waste time and resources running more expensive tests. The CI/CD pipeline only promotes the application to the next stage if it successfully passes the previous one. In this way, our confidence in the software’s quality gradually increases as it progresses through the pipeline. At the same time, feedback is provided as quickly as possible whenever a defect is discovered.

We’ll zoom in on each stage in the following sections.

Implementations of CI/CD are as diverse as the applications they deliver. The point of this section is to develop a mental model of the high-level patterns in a typical pipeline. This example aims to be agnostic to both tech-stack and architecture, in favor of describing the general building blocks that frequently occur in CI/CD systems. After reading this section, you will have a framework in which to organize solutions and technologies as you encounter them in real-world systems.

A quick aside: As we walk through the example pipeline, you may find yourself wondering where each of these steps is occurring. Simple CI/CD systems can be hosted pretty much anywhere, including on a developer’s laptop or an on-premises server. In practice, modern CI/CD pipelines are usually managed by a remote, ephemeral, “CI” server whose sole responsibility is orchestrating CI/CD. Lightweight verifications, especially those that occur in the commit stage, may be executed directly on the CI server itself. Later on in the pipeline, the server acts more so as a control plane — orchestrating the provisioning and configuration of deployment environments and test systems. Alternatively, some pipelines will also employ a separate “CD” service that runs continually as part of the live application. This service does not handle the provisioning of infrastructure, but instead focuses on the deployment and rollback of services on top of the existing infrastructure.

The Commit Stage

The commit stage implements the CI portion of the pipeline, and it begins as soon as a developer checks in to main. The primary input to this stage is the latest version of the source code in the main branch of the repository, and it outputs a deployable artifact along with a report detailing the commit stage test results.

When the commit stage begins, the CI server checks out the latest version of the source code and prepares the application for execution. Depending on the technologies being used, this may require a dedicated built step. Interpreted languages (Ruby and Python for example) don’t typically need to be built, whereas other languages (such as Go and Rust) need to be compiled down to an executable form. JavaScript is an interesting case — though it doesn’t inherently require a build step, CI will often “transpile” the source code to make it more compact or compatible with older environments. Then, of course, there are supersets of JavaScript such as TypeScript that require a kind of compilation before being executable. A built step can be beneficial in that the source may fail to build if a dependency is missing or an error can be discovered through static analysis.

If the source code builds successfully, the CI server kicks off a comprehensive suite of unit tests. Unit tests are the heart of the commit stage because they run very quickly and provide excellent isolation of regressions. It is critical to keep the commit stage speedy — ideally less than 5 minutes. This encourages developers to wait for tests to complete before moving on to the next task, thereby avoiding the switching cost of jumping back and forth to fix bugs that the commit stage reveals. In addition, it’s a good idea to keep main green at all times, meaning that other developers can't commit their changes while the commit stage is running—they have to wait and make sure the previous commit passes.

Along with unit tests, other checks that I will call “repo alerts” are often included in the commit stage. These analyze attributes such as code coverage (the percentage of code covered by the test suite), linting, copy-paste detection, and conformance to a style guide. These kinds of checks can be configured to fail the commit stage if a team so chooses.

Finally, after all of the commit stage verifications are successful, the code is packaged into a deployable artifact. The deployable artifact is the realization of the current state of the source code in an immutable, executable form. It may be a binary executable (in the case of compiled languages), a .zip file, a container image (i.e. Docker), or a virtual machine image (i.e. AMI). This artifact is uploaded to an artifact repository, and tagged with meta-data that identifies the version of the source code it originated from. This allows developers to inspect the source to understand the behavior of the artifact, and it also makes the artifact reproducible in case it is ever lost.

Distributing software as a single, reusable, deployable artifact rather than re-building from version control is a powerful idea. It ensures that the same configuration of the source code and its dependencies is deployed in every environment. Every time the artifact is deployed in a pre-production environment we gain confidence that it will also be deployable in production.

Successful completion of the commit stage is a major milestone for a piece of software. All of the changes to the source code have been combined, and the application has been shown to build and pass its full suite of unit tests. Finally, CI has created an artifact that encapsulates the code’s dependencies and can be used to deploy the software going forward.

The Acceptance Stage

With a deployable artifact in hand, we now have a candidate for release to end users. This is the transition point from CI to CD. The role of the CD phase is to verify that the release candidate satisfies business requirements and is delivered reliably. CD also represents a categorical shift in how we view the artifact. Our priorities in CI were biased towards identifying the most likely problems and failing as quickly as possible. Now in CD, we are biased towards taking the time needed to gain as much confidence as possible that the release candidate is ready for production.

This change in priorities is reflected in the acceptance stage architecture. Rather than running the software in isolation and faking its dependencies, the objective is to test the software under conditions that mirror production as closely as possible. This requires deploying the artifact to one or more pre-production environments.

When the acceptance stage begins, the deployable artifact is retrieved from the artifact repository, and deployed into a test environment. This environment includes instances of the application’s external dependencies — file servers, databases, caches, queues, and web services. The entire environment should be provisioned and configured automatically by the CD pipeline. Doing so means that the entire deployment process is regularly tested. This builds confidence that the production deployment will go smoothly as well. Such automation is implemented through deployment scripts, infrastructure-as-code tooling, configuration management, and infrastructure orchestration systems. Everything required to deploy the application should be version-controlled so that any changes made to the deployment process are recorded and made explicit. The only thing that should change between deployment environments (ex. from tests to prod) is the configuration details of the deployment process (which should themselves be version-controlled). In this way, if a deployment succeeds in one environment and fails in another, we can isolate the failure to a misconfiguration.

Once the application is fully deployed, configured, and initialized, the next step is to conduct a small set of smoke tests. These tests will catch any obvious configuration issues right away — things such as an invalid database connection string, or a web service that failed to start up. These issues are caught by a few simple test cases that touch every component in the application and fail if any component is unresponsive. It’s important to catch such problems right away before we spend unnecessary resources on the more expensive tests that follow.

The core of the acceptance stage is made up of end-to-end (E2E) acceptance tests. E2E tests probe the application through its business-facing interfaces (Web GUI, API, etc.) and exercise user workflows that depend on the entire application working correctly as a whole. These tests provide a high degree of confidence that the application is in working order, but they also come with major tradeoffs. Let’s take a look at two of these tradeoffs in particular.

First, E2E tests are much slower than most other tests. Testing a single workflow may involve many sequential interactions, and since we aren’t faking any part of the system, each interaction may result in a cascade of network calls that take time to complete. When you account for the time it takes to set up and teardown state before and after a test, a single test case may take anywhere from a few seconds to several minutes to run. In some cases, E2E tests can be run in parallel to speed things up, but this requires careful design and its feasibility depends on the details of the system.

“I have seen [E2E tests] take up to a day to run, if not longer, and on one project I worked on, a full regression suite took six weeks!”¹
— Sam Newman

The second tradeoff with E2E tests is that they can be difficult to maintain. Since they exercise the entire system at once, the cause of a failing test is often hard to diagnose. In addition, the large scope of the tests makes it more likely that changes to the system will require updating them. Furthermore, if the test touches systems developed by more than one team, and it may be unclear which team should be responsible for maintaining the test.

For these reasons, companies differ in how much they rely on E2E testing. Some companies use E2E tests extensively as the explicit acceptance criteria for all of the application’s functionality. When a user story is being developed, the first step is to create an E2E test that validates the story’s behavior. The story is only considered to be complete when this test is made to pass. The test then serves as a regression test for that story going forward.

On the other end of the spectrum, some companies forgo end-to-end testing altogether to avoid the inefficiency of running and maintaining them. These companies place a greater emphasis on testing in production, where robust monitoring and rollback capabilities allow features to be released to users with less pre-production testing. If there is any increase in failure metrics (5xx status codes, frontend errors, increased latency), the changes are automatically rolled back before they have a significant impact on the business. Testing in production is aided by modern release techniques such as feature flags and canary deployments. These make it possible to expose only a small slice of users to the new version of the application so that the blast radius is small if an issue is discovered. The new feature is only rolled out to the rest of the user base after it is shown to be reliable.

A third option, and probably the most common strategy, is to use E2E testing selectively to test business-critical functionality. This means only testing features that would be catastrophic for the business if they failed. This middle way strikes a balance of keeping the E2E test suite efficient and maintainable, while also gaining necessary confidence in the correctness of the application before users are exposed to it.

E2E tests verify that the application meets its functional acceptance criteria (it exhibits the behavior that users expect). These tests are often accompanied by tests that interrogate non-functional requirements (NFRs), also known as “operational requirements”. NFR tests verify attributes such as performance (latency per request), throughput (requests per second or QPS), and capacity (maximum throughput while maintaining acceptable performance). They often consist of things like load tests, soak tests, and profiling. Depending on the application, these tests might not be strictly required to pass for a release candidate to proceed. Sometimes it is acceptable for performance to take a small hit in the short term to get a new functional behavior released quickly. In any case, NFR tests are important because, in the long run, operational characteristics can become functional problems if they are left unattended. It doesn’t matter whether your app meets its functional criteria if it crashes every day during peak business hours.

From Acceptance to Release

After the acceptance stage, the pipeline may either proceed automatically to the release stage or pause to wait for approval from a reviewer. This decision depends on several factors including the maturity of the CI/CD pipeline, the risk-tolerance of the business, the compliance regulations in play, and the weight that the business places on delivery speed.

Many companies trust their CI/CD pipeline to reliably catch regressions and rollback releases whenever issues slip into production. This level of maturity is powerful because it allows delivery to be completely self-serve for developers. Any developer can check in their changes, and within a few minutes, those changes may be seen by end users without any additional action on the part of the developer.

Other businesses prefer to have manual approval before new versions are released. After the acceptance stage, the pipeline generates a report detailing the results of the automated tests. If the reviewer is satisfied with the results, they can manually promote the software into the release stage with the press of a button.

To gain even more confidence, the reviewer may decide to test the changes manually in a sandbox environment. Again, with the push of a button, they can trigger the provisioning of a staging environment and the deployment of the app to that environment, allowing them to inspect the application and verify its fitness for release. This optional stage is referred to as user acceptance testing (UAT). Once the reviewer is satisfied, they can de-provision the staging environment and trigger a production release.

The Release Stage

The release stage of CD is oriented around updating the software in production in a way that is safe and reliable. Releases should not require application downtime for users, and it should be possible to reverse the update if the new version doesn’t behave as expected.

When the release stage is triggered, the first step is to download the same deployable artifact from the artifact repository. As discussed previously, since this artifact has already been deployed several times, we already have a lot of confidence that it will deploy successfully in the production environment.

For the simplest of CI/CD pipelines, deploying into the production environment may be all that is needed to release the software to end users. In this case, the old version of the application is spun down before the new version of the application is spun up. In more mature pipelines, the new version is deployed alongside the old version without the new version becoming immediately discoverable to production traffic. This strategy is termed a “dark launch”, and it is powerful because it allows the pipeline to act on any issues that occur with installation, configuration, and initialization before any users are affected.

Once the new version is deployed in the production environment, the pipeline can run a few smoke tests to confirm that the app has initialized and that all of its dependencies are present and discoverable. An advanced technique called “traffic mirroring” can be employed as well, where requests from real users are copied in real-time, and executed against the new version of the application. The results of these mirrored requests are not visible to users, but it allows the internal CD system to verify that the new version of the application behaves correctly against real production traffic. This is a powerful way to further increase confidence that the release will be successful, but it also adds significant complexity to the release process.

There are a variety of strategies for safely releasing the new version to end users. Confusingly, many of them have the term “deployment” in the name (see deployment vs. release in Part 1) The two most common strategies are blue-green deployment and canary deployment. In a blue-green deployment, production traffic is re-routed to the new version of the application all at once. The old version is kept running as a stand-by so that traffic can be switched back at any time should the new version experience any problems. Once the pipeline is confident that the new version is handling production traffic without issues, it de-provisions the old version of the application. The benefit of blue-green deployment is its simplicity compared to other zero-downtime release strategies.

Canary deployment takes a more incremental approach to release. Once the new version of the app is deployed in the production environment, only a small slice of production traffic is routed to it. This makes it possible to test how the new version of the app behaves while limiting the number of users who are exposed to potential problems. If everything looks good with the small slice of user traffic, the percentage of traffic routed to the canary is increased. This incremental check-and-increase cycle repeats until eventually the new version is serving 100% of production traffic. While this strategy is more complex, it significantly decreases the impact on customers if a problem arises.

Arguably the most important element of a safe release process is the ability to automatically roll back to the previous known-good version. At a bare minimum, it should be possible to manually trigger a rollback at the push of a button. A better scenario is for the pipeline to be able to recognize a problem in production and carry out the rollback automatically. This capability requires that the software is sufficiently instrumented and monitored so that problems can be detected by the system as they occur. For example, the pipeline should be able to track the percentage of requests that respond with 4xx and 5xx status codes. It might also account for the average latency for serving requests. If these metrics surpass a pre-set threshold or see a marked increase during a deployment, that could be a trigger for the pipeline to roll back the release.

What’s next?

Throughout this series, we have examined CI/CD as a system for delivering high-quality software. We have touched on the cross-cutting nature of the domain — concerning software development, operations, and business. CI/CD is an incredibly rich topic; there are many considerations when implementing a pipeline, and hundreds of tools and technologies to choose from. Successfully navigating these waters requires one to be aware of the problems these tools are designed to solve, along with the ways these technologies are typically composed together. Understanding the fundamental ideas makes us better equipped to evaluate existing problems in our businesses, and select a suitable path for improving how we deliver software.

For a few tips on how to get started, consider revisiting the points in Adopting CI/CD from Part 1. I would also recommend reviewing the ideas discussed in these articles and pursuing deeper research in any areas that are especially interesting to you.

Finally… I love feedback! Feel free to connect with me if you want to chat about any of the ideas presented here.

References

1. Newman, Sam. Building Microservices (p. 289). O'Reilly Media.
   Kindle Edition.

* The name of this series is inspired by the book Release It!

Further Reading

- Continuous Delivery by Jez Humble and David Farley
- The Practical Test Pyramid - overview of testing strategies by Ham Vocke
- Site Reliability Engineering by a Google SRE team. Particularly the
  chapter on Testing for Reliability
- Testing in Production - engaging blog series by Cindy Sridharan.
- Fullstack Open Part 11: CI/CD - practical, hands-on guide to building
  your first CI/CD pipeline with GitHub Actions.