Deliver It!

Part 1 - Business Scalability and CI/CD

Zach Morgan
11 min readJul 15, 2024
Photo by Brett Jordan on Unsplash

CI/CD is one of those topics whose breadth and depth make it difficult to grok at first encounter. This series is the resource I would have wanted during my initial forays into the domain. I do not discuss Terraform, Ansible, or any other of the myriad technologies that compose modern CI/CD pipelines. Each of these tools is a world of its own, and it is important (and fun!) to learn them when the time comes. But before any tool can be used effectively, one must understand the problem it solves and the patterns it is used within. This series discusses those patterns.

Part 1 begins by introducing the concepts of Continuous Integration and Continuous Delivery; establishing the vocabulary needed to proceed with clarity. The remainder of part 1 examines some of the real-world challenges with delivering valuable software in a growing business. Grasping these challenges sets the stage for Part 2 which steps through a CI/CD pipeline and discusses its components.

Contents

Part 1 — Business scalability and CI/CD

  1. What is CI/CD?
  2. Why CI/CD unlocks speed and quality at scale
    - Efficiency Problems
    - Problems that Increase Risk
    - Risk and Infrastructure
  3. Adopting CI/CD

Part 2 — Patterns of effective CI/CD

What is CI/CD?

CI/CD is shorthand for Continuous Integration / Continuous Delivery. In one sense, these terms refer to particular ways of thinking about how a team approaches software development. They also denote formalized processes that implement these ideas in a way that is efficient, reliable, and highly automated.

On any software team with more than one developer, the source code is always evolving along multiple separate threads. Continuous Integration means that all of the changes being made in parallel are merged together frequently. How frequently depends mainly on the version control workflow that the team uses. Frequency typically ranges from several times per day up to once every 1–2 weeks. Also, note that the word “merged” in this context implies that conflicts are resolved and the code is not left in a broken state as a result.

Even when the version control trunk (let’s call it main) is kept up to date with the latest changes, those changes don't create new value for the business until they are made available to end users. Continuous Delivery expedites the realization of business value by treating every change committed to main as a candidate for release. It employs an automated process to verify the fitness of the release candidate and deliver it to end users if it is deemed worthy.

“Software isn’t done until you deliver it to the user.”¹
— Yevgeniy Brikman

You may hear CD alternatively defined as continuous deployment rather than continuous delivery. To deploy software is to assemble, install, configure, and run it. Deploying software does not necessarily mean that the software is released. To release software is to cause it to serve the production traffic of end users. Software may be deployed many times as part of development and verification before it is released to the end user. For that reason, delivery is probably a better term — it means that software is not only deployed, but also released, and thereby generating value for the business.

Perhaps the most important ingredient in effective CI/CD is automation. It is true that even without automation, a team may gain some benefit from these ideas — regularly checking their changes into main, testing the app to catch regressions, deploying to pre-production environments, and releasing incremental updates to users. But without automation, this process will be error-prone and costly in terms of developer's time. "Automate Everything" is the watchword that leads to speed and safety in software delivery.

“If you want something to happen, ask.
If you want it to happen often, automate it.”²
— Ivan Kirigin

Why CI/CD unlocks speed and quality at scale

The automated processes implementing CI and CD are usually composed into a single system known as a CI/CD pipeline. Although this system is internal to the business, it is fundamentally oriented at addressing business-facing problems. An effective pipeline often leads to a marked improvement in both software quality and development velocity. Furthermore, these benefits become more apparent the more that a business scales. To better understand how CI/CD pipelines provide value to a business, let’s reframe the discussion in terms of the problems that arise without this system. These can be classified according to two categories: efficiency and risk.

Efficiency Problems

Here’s a simple way to think about the efficiency of a software team: Consider the amount of business value being produced given the amount of developer time spent. CI/CD can significantly improve this ratio by shortening development cycles and automating manual processes.

Long development cycles are a common symptom of waiting too long to adopt CI/CD. They lead to wasted business resources in two ways.

The first way has to do with the cost of integration conflicts. It is commonplace for new features to be developed in long-lived, isolated version control branches that are only merged into main when a feature is completed. This workflow seems intuitive since it allows the developer to focus on their implementation; treating integration as a separate step.

However, there’s a problem with deferring integration. When a developer is working in an isolated branch, they are working with an old (potentially outdated) version of the codebase. They are developing against an unverified assumption of how main actually works. The longer they do so, the more they reinforce a potentially inadequate implementation of the feature. By the time a long-lived branch is eventually integrated, main may have changed in the meantime in a way that is incompatible with the incoming branch. This problem is made worse when the other team members are also working in isolated branches. Oftentimes many branches will be merged right before a release occurs, resulting in many conflicts.

CI addresses this problem by bringing the pain of integration forward in the process. Developers either check in directly to main, or bring their feature branches up-to-date with main on a regular (at least daily) basis. In this way, conflicts are surfaced as soon as possible and can be resolved without wasting effort on an out-of-date implementation. This illustrates a fundamental principle in CI/CD: bring pain forward in the process where it is cheap to deal with. We will see more examples of this principle, especially in Part 2.

The other reason that long development cycles are inefficient is that they delay the realization of value from development. As long as new code is locked away in version control and developer laptops, it is not generating value for the business. If a team goes several weeks or months without releasing their changes to production, the business has essentially invested a large sum in developer salaries on the prospect of realizing value from that work in the future. On the other hand, if software is released daily, there is practically no unrealized investment. Each day of development work is immediately realized in value for the business.

A third efficiency problem that is addressed by CI/CD stems from the poor scalability of manual processes. Software testing is the archetypal example of this principle. Without automated verification that the software works correctly, developers (or dedicated testers) have to manually explore the application to see if their changes have caused any regressions. With every new feature, the number of scenarios that need to be tested increases. Over time, the amount of effort required to test the software makes it risky to make changes and also lengthens the time it takes to release new features.

When testing is automated, developers don’t need to spend time manually verifying existing functionality. Instead, they simply write automated tests only for the new functionality that was added. For you Leetcode nerds, that means that the time complexity of testing the software after each new feature is O(1). Without automated tests, every new feature results in an exponential increase in the time it takes to manually test the software. Although it requires a little bit of setup at the beginning of a project, writing automated tests and running them on every check-in scales much better, and will become cheaper than manual testing very early in the lifetime of a project. This is only one example among many that illustrates how automating delivery processes dramatically increases the efficiency of the business.

Problems that increase Risk

Moving beyond efficiency concerns, there are also several ways in which CI/CD mitigates the risk that a business undertakes. Perhaps the most prominent of these is reliability, which can be expressed in two dimensions:

  • Mean Time To Failure (MTTF) — the average time between major production outages and business-critical bugs.
  • Mean Time To Recovery (MTTR) — the average time it takes to restore service after an issue is discovered.

A low MTTF and a high MTTR are damaging to a business’s reputation, bottom line, and (in some cases) can cause harm or loss to users. CI/CD increases MTTF (makes failures less frequent) by leveraging progressively rigorous stages of automated testing to catch issues before they make it into production. More on that in Part 2.

CI/CD also reduces MTTR (allows problems to be fixed quickly) by shortening development cycles and automating deployment. Shorter development cycles mean that each release includes a smaller number of changes. This makes it easier to identify the cause of an issue if one makes it into production. Delivery risk and delivery frequency are inversely related — the risk of delivery goes down as the frequency of delivery goes up.

Another way that CI/CD decreases MTTR is by automating the release and rollback processes. If an issue is discovered in production, it can be addressed immediately at the push of a button by rolling the app back to the previous good version. Alternatively, if the root cause is readily apparent, a fix can be committed to version control and deployed to production just as easily. Note that push-button deployments and rollbacks are only possible if these processes are fully automated, including the management of infrastructure.

Risk and Infrastructure

Although software reliability is best expressed in terms of quantitative metrics like MTTF and MTTR, affecting positive change in this area requires a closer look at the underlying processes. Probably the most significant concern that hasn’t been addressed thus far is the management of infrastructure. At the beginning of a project, it is typical for infrastructure primitives to be provisioned, configured, and managed in an ad-hoc way using a CLI or web UI. While this approach is low-friction to begin with, it comes with significant risk as the application scales over time. Let’s take a moment to explore the problems with managing infrastructure by hand.

The first problem is configuration drift. When servers are initialized or updated by hand, it is easy for small differences to be introduced, especially when urgent bugs need to be fixed or performance is being tuned. The servers become like snowflakes — each one being a unique work of art with a slightly different configuration from the other servers in the environment. Inconsistent configuration makes debugging more difficult since it is impossible to know the exact configuration of a given machine. It also lowers the value of pre-production testing since there is no guarantee that the servers in pre-production environments have the same configuration as those in production.

Documentation is another sticking point when infrastructure is managed by hand. Configuring a deployment environment is complicated; it is hard to document everything, and nearly impossible to keep this documentation up to date whenever changes are made. This leads to two additional risks. The first one is that a step will be missed when performing a deployment, leading to configuration drift, or even an outright outage. The second risk is that the knowledge of how to deploy the application is stored in the heads of a small number of people, thereby making them a single point of failure. If these people leave the organization, the knowledge of how to deploy the application goes out the door with them.

Yet another issue with managing infrastructure by hand is the lack of audibility. There is no definitive and comprehensive record of the changes being made to the production environment. While this is clearly a problem for compliance reasons, it is also a reliability concern. If an infrastructure misconfiguration ever results in an outage, it may be hard to determine the exact series of changes that led up to the failure.

CI/CD mitigates these risks for a simple reason. For an app to be deployable by an automated process, all of the steps must be codified in some way. Unlike manual interactions with a web UI, code is easy to distribute, audit, and version control. Changing the configuration of a server looks like updating the deployment script, meaning that the change is recorded in version control and is applied consistently during every deployment thereafter. This also reduces the need for documentation, since the code itself is an executable documentation of what to deploy and how to deploy it. Documentation is only needed to explain why the system is designed as it is. Furthermore, instead of deployment know-how belonging to a small group of people, anyone can deploy changes using the automated process. They can even see how deployment works by reviewing the infrastructure code. In this way, CI/CD leverages Infrastructure-as-Code (IaC) to make operational processes explicit, consistent, and testable, which significantly reduces risk and increases reliability.

Adopting CI/CD

Few things come for free. CI/CD has many desirable benefits, but implementing a robust CI/CD pipeline can be very complex. Rather than trying to implement everything at once, it is better to build such systems incrementally. Each small step should add value to the business. Breaking things up in this manner also helps to avoid being overwhelmed by the complexity. Beyond this rule, here are a few other tips that may help with successfully adopting CI/CD:

  • Start in the early days of a project — CI/CD pipelines are best when they mature gradually along with the software they manage. It is simple to get started when the app is only at the “hello world” stage, and you can add functionality to the pipeline as specific needs arise.
  • Start with automated testing — CI/CD can only verify the quality of software to the degree that there are automated tests that exercise its functionality. Like the pipeline itself, automated tests are trivial to get started with when the project is in its infancy. In addition, automated tests have the knock-on effect of encouraging good software design. While they may appear to require unnecessary work in the beginning, I’ve seen automated tests repay the time invested in writing them within mere days of starting a new project.
  • It would be nice if every project began with a foundation in CI/CD and automated tests. But in reality, at some point, you will likely find yourself having to bolt these systems onto an existing codebase. In this scenario, a great place to start is to identify the biggest pain points or highest risk elements of the development process and begin there. Apply the 80/20 principle to get the most value out of your efforts as early as possible.

If you keep at it, over time you will start to see gains in development speed and reliability metrics, and it will be easier to keep improving with that momentum. The important thing is to keep taking small steps that incrementally increase the effectiveness of the process.

Now that we have a high-level understanding of the concepts and problem space, it’s time to examine the solution with more fidelity. In Part 2, we will walk through the stages of a CI/CD pipeline, and glean more insights into how they can be built effectively.

References

1. Brikman, Yevgeniy. Terraform: Up and Running (p. 30). O'Reilly Media.
Kindle Edition.
2. Ejsmont, Artur. Web Scalability for Startup Engineers (p. 332).
McGraw Hill LLC. Kindle Edition.

* The name of this series is inspired by the book Release It!
Further Reading

- Continuous Delivery by Jez Humble and David Farley
- The Practical Test Pyramid - overview of testing strategies by Ham Vocke
- Site Reliability Engineering by a Google SRE team. Particularly the
chapter on Testing for Reliability
- Testing in Production - engaging blog series by Cindy Sridharan.
- Fullstack Open Part 11: CI/CD - practical, hands-on guide to building
your first CI/CD pipeline with GitHub Actions.

--

--

Zach Morgan
Zach Morgan

Written by Zach Morgan

Learning stuff and solving problems with scalable software. https://zachmorgan.dev

No responses yet