Continuous delivery — Part 1

Published in

Engineering @ Wave

9 min readDec 1, 2017

We write software to provide a great service to our users. Product discovery, estimation of development time, development work, and testing are all part of the software development lifecycle. This article is going to focus on the final phase of software development: delivery (sometimes called deployment). In this first article of a two-part series, we are going to look specifically at continuous delivery (CD), how it differs from the traditional delivery model and who in the industry uses it. I’ll outline reasons why CD is important, and conclude this article by exploring some of the reasons people think it will not work in their organization.

In the follow-up article, we will have a look at our case at Wave. Specifically, how Wave transitioned from a traditional delivery model to CD, the impacts of the transition on the company, and how our delivery process works today; and we will dive into more technical details.

What is continuous delivery and who uses it?

Traditionally, once a piece of software is ready to be delivered to our users, an operations team takes care of making the new version available in production. Deployments into production may happen on a weekly or biweekly schedule, and can even be as infrequent as every quarter.

When a system gets deployed infrequently (once a week or less), there are many risks and challenges to consider:

Risk of regression
Building the wrong product features
Pressure to always deploy successfully
Last-minute changes
Hotfixes

Let’s elaborate on each of these points.

First, there is a risk of regression because the less often software is released, the more code gets deployed at one time and the greater the risk of releasing a bug or other issue is. If existing functionality breaks, it may be hard to figure out which change caused this to happen.

Second, with infrequent deploys, users need to wait weeks or months for the software to evolve and improve. A development team may spend months building a set of features only to realize later that users do not care for them. Facebook is famous for its mantra “move fast and break things.” While some think of this as reckless, the idea behind this is that it is better to release something small and not perfect today than to release something larger and better later. In other words, having the ability to release small changes frequently helps determine how popular a feature is before significant investments have been made.

Deploying infrequently also creates pressure on all the teams involved. When a deployment occurs every few months or so, it must complete successfully (not only the actual deployment itself but features and bugfixes must be in as planned). It often happens overnight or on the weekend to disrupt service as little as possible. This puts a tremendous amount of pressure on everybody involved, including software and operations engineers, product managers, support, QA engineers.

Another common issue with deploying infrequently is having “scope creep”, which means that stakeholders try and add more features before the deadline (i.e. the deployment), resulting in last-minute changes. This adds on to the existing pressure mentioned above and can easily result in cutting corners which translates to few unit tests (if any), more bugs and technical debt in general.

Finally, if a critical bug is found right after the deployment, a hotfix may be needed which causes new changes to be deployed, which hurts customers by making downtimes longer, and makes the overall process more painful for engineers since many steps have to be repeated.

On the other hand, continuous delivery is a set of practices that allows deploying changes to production safer, faster, and in a sustainable way. From a product perspective, a change can be as small as one line of code or even just one character (or even no lines of code at all if the deployment is related to changing an environment variable but this is more of a configuration deployment). Without continuous delivery, software is only shipped at a given date. If the business wants to release on the spot, it may go poorly because of the complicated nature of the deployment process (coordination of many different teams and lots of manual steps). With continuous delivery, deploying software is no longer a technical decision but a business decision instead.

Tech giants Google, Facebook, Netflix, and Amazon are among the many companies that use CD practices. Amazon is known to have done an average of 23,000 deployments per day in 2012 (see resource 1). While this number seems impressive, it took Amazon 4 years to redo their architecture in order to achieve this (see resource 2). It highlights how difficult a process it can be once an ecosystem has been built up around a more traditional deployment model. For an early-stage startup, it may make more sense to spend time and resources building products but the longer one waits, the harder it will be to transition as there will be more moving parts.

While it is hard to accomplish, many companies, like Amazon, eventually take the steps in order to make the transformation to CD. Capital One is a great example: it went from one deployment per quarter to over 10 deployments per application per day (see resource 3).

Etsy is another company that, before 2010, did not have CD. Etsy’s former VP of Engineering Mike Brittain, in his talk about CD at Etsy (see resource 4), refers to a “deployment army” that was required to do every deployment, which happened infrequently, and took anywhere between 6 and 14 hours to release with downtime. After transitioning to CD practices, only one person was required to perform the deployment and it took only 15 minutes without downtime.

An important point to mention is that, similar to Etsy, many so-called “unicorns” and other successful startups did not start with CD, but rather changed the way that they deliver software when they ran into major hurdles. For instance, LinkedIn, in 2011, had such a difficult time deploying that they went through a two-month feature freeze to re-architect their system. Twitter is another example of a company that did a major refactoring of its architecture (see resource 1).

Not all companies start out with a traditional deployment process though, and Slack is a good example of such a company. Since its inception, Slack never released software all at once and with planned downtime. Instead, CD was part of the release process right from the start. It allowed Slack, as of 2016, to deploy 40 times a day, according to Keith Adams in his talk “How Slack works” (see resource 5). In 2017, this number has increased to 200 per day. Slack is younger than companies like Google and Amazon and may have benefited from more recent and better tooling around CD, but it is still not that young as it was founded in 2009. I have seen more recent products built without CD in mind.

Now that we’ve covered what continuous delivery as a process is, and some of the companies that practice continuous delivery, let’s elaborate on why it matters and not just for tech companies.

Why it matters

Code has no value until it gets deployed into production, and continuous delivery ensures that code is moved into production quickly. In order to achieve CD, a high level of automation is necessary. Earlier we talked about some of the risks and challenges with not doing CD. Let’s take a look at some of the advantages of automating this process:

Issues are easily fixed as shipping a bugfix is trivial.
The risk of building the wrong product is greatly reduced as we can ship a small, minimum viable product version of a feature and this helps determine whether or not there is a market fit for it. It shortens the time between when a business idea is born and its availability to users (often referred to as the lead time or cycle time — see resource 6).
If we have to delay the deployment of a feature for a few days because it’s not ready, we don’t have to wait another few weeks or months for the next opportunity to ship something to our users.
It helps us avoid downtime when deploying. This is very important because we shouldn’t interrupt our users if we don’t have to.
It helps increase employee satisfaction. As a developer on a product team, it is very rewarding to have the ability to push the button and deliver a new feature or a bugfix to users. It is also great to know that the organization trusts its people in allowing them to do so. As an operations engineer, you no longer worry about specific product deployments, but rather about changes to the deployment pipeline (which should be much less frequent).
Deploying is done in a sustainable, predictable and repeatable way.

To that last point, deploying manually does not satisfy these requirements as:

It is only repeatable as long as there is enough documentation (which is hard to maintain and eventually gets out of date);
It is not predictable as people will make mistakes sooner or later despite their best efforts;
It is not sustainable if we want to deploy many times a day.

A great example of how delivering software later rather than sooner can affect a business is found in Jez Humble’s talk “Why Scaling Agile Doesn’t Work” (GOTO conference 2015 — see resource 7). He talks about a project that he worked on for a European airline. The airline was about to introduce a new class of travel (premium economy) in its aircrafts and they realized that it was going to take them longer to change the booking system than reconfiguring the actual aircrafts. This means that while planes may have been ready, customers couldn’t book those new seats.

Another example of how manual delivery can cause a lot of damages is found in the story of Knight Capital. In 2012, this financial services firm lost $460 million and went bankrupt in 45 minutes because of a failed deployment (see resource 8). These 2 examples highlight how software delivery can hurt the bottom line of a company or take it down entirely.

The 2017 State of DevOps report, which looks at the adoption of the DevOps practices and values and their impact, notes that continuous delivery, as one of the technical practices of the DevOps movement, “significantly contributes to both lower deployment pain and higher IT performance”.

“Continuous delivery is great but it won’t work here”

In another talk (see resource 2), Jez Humble talks about some of the reasons people give as to why CD won’t work in their organization:

Regulation
We don’t build a website
Too much legacy
People are too stupid

While these are the stated reasons, he goes on to talk about the real reasons — cultural and architectural issues

Cultural issues

Employees don’t have the tools or the authority to change the process in place in order to improve it and don’t have the responsibility to built quality product. As a result, people are not engaged in their work. This contrasts with high-performing organizations, where there is a high level of trust in employees, who are generally never satisfied with their process and always strive to improve it

Architectural issues

The architecture does not support delivering software in a sustainable, reliable and repeatable way for many reasons, including:

A team needs to ask for permission from somebody outside the team when making large-scale changes to their own system
Prior to finishing work, a team needs to communicate and coordinate with other teams
One service cannot be deployed independently of others
Testing cannot be done on demand without requiring an integrated test environment. Setting up such an environment may take days or weeks to spin up
Deployment cannot be done during normal business hours with minimal downtime

Another reason is the cost of such transformation. If the leaders of a company do not understand the benefits of CD, they will only see this effort as a huge cost rather than a long-term investment.

At the end of the day, the reasons people provide against transitioning to CD boil down to a misunderstanding of why it is important and what needs to change from a technical standpoint.

Summary

With continuous delivery, organizations are able to deliver better software faster. Achieving it is not an easy task but there are countless examples of companies that have gone through the transformation and have found that the benefits far outweigh the costs.

Thanks

I would like to thank the following people for reviewing this article and providing suggestions: Matthew Montreuil, Nick Presta, Rob Maurin and Michael Warkentin. Also, this article would not be the same without the help of Erica Pisani, who reviewed it multiple times and helped shape it.

Resources

The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win Paperback, by Gene Kim, Kevin Behr and George Spafford
Continuous Delivery Sounds Great But It Won’t Work Here https://www.infoq.com/presentations/continuous-delivery-highlights
DOES16 San Francisco — DevOps at Capital One: Focusing on Pipeline and Measurement: https://www.youtube.com/watch?v=6Q0mtVnnthQ
Continuous Delivery: The Dirty Details • Mike Brittain: https://www.youtube.com/watch?v=JR-ccCTmMKY
How Slack works, Keith Adams https://www.infoq.com/presentations/slack-infrastructure
Lead time: https://www.agilealliance.org/glossary/lead-time/
Why Scaling Agile Doesn’t Work • Jez Humble: https://www.youtube.com/watch?v=2zYxWEZ0gYg
Knightmare: A DevOps Cautionary Tale https://dougseven.com/2014/04/17/knightmare-a-devops-cautionary-tale/