Stop Shipping Defects

The best time to invest in automated testing is now

Published in

Slalom Build

11 min readNov 30, 2022

Modern software organizations employ Agile processes, which involve updating and releasing software frequently — continuously. Changes and updates are deployed to live sites multiple times a day to adapt to ever-changing user expectations.

As a consumer, I love this, and my money embraces the organizations that rapidly adapt to meet my whims. But the market has little patience for outages and breakages, so software testing processes must be prepared to find and surface problems and risks frequently — continuously, within the small window between inception and delivery.

In this article, I will demonstrate that the most effective use of testing dollars is to include automated test development as an integral part of original story development — as part of the Definition of Done.

I’ll first present a model of sprint-based delivery using a freight train analogy. Then we’ll use this model to inspect key moments during the build-and-test timeline in which software defects are introduced and/or surfaced:

End-of-sprint release
When a story is completed and merged mid-sprint
When a story is under initial development
During subsequent releases

These key moments will help define and describe automated-test development strategies, roughly following the evolutionary steps that software organizations take along the way:

Post-release automation
In-sprint (post-merge) automation
Pre-merge automation

By contrasting and comparing all possible automation injection moments, we’ll see that pre-merge automation consistently drives the highest quality results, and is also the cheapest and most effective use of testing dollars. There is no cheaper “good enough” option, pre-merge automation is the one true way to sustainably maximize quality and minimize cost.

Software delivery overview

During Agile software delivery, work is broken down into smaller chunks often referred to as user stories.

As we build out our user stories, we merge them into the main branch and stage them for release — we load them onto the railcar. Releases occur frequently, perhaps at the end of every two-week sprint.

Any defects in the work we deliver could be damaging to our users and to our organization. To mitigate that risk, we need to check that our stories are working properly. The following sections provide insights into when defects are injected into the software delivery process, and what common strategies are used to mitigate the damaging impact.

Preventing defects in delivery: release testing

A common test strategy is to test all user stories right before we ship, so that we prevent any discovered defects from escaping. This is called release testing. This is also sometimes called post-merge testing, as the testing occurs after the story has been merged in to the main branch and staged for release.

If critical damage is discovered, we will need to defer the release until we can correct it. Note that this defers the entire release, including the fully functional stories that were loaded.

Slipping release dates can be highly disruptive to business planning, so it’s often reserved for only the most egregious of issues. Release testing often leads to consistent leakage of low-priority defects to production.

Release testing prevents critical defects in newly-delivered software. However, it does not prevent regressions in previously-delivered software, and often results in the regular delivery of known non-blocking defects.

Preventing defects in sprint: story testing

The next stage in test strategy evolution involves shifting from release testing to earlier in the process. This leftward shifting of testing will occur again throughout this article and is a common theme throughout software delivery process improvement.

If any stories are incomplete or defective when shipped, they represent a liability to our users and to us. Even worse, if a critical defect is loaded onto the release car, we can’t ship anything—even the good stories. By testing stories before we merge them, we’re able to withhold incomplete stories, and ship the things that are done. This is sometimes called pre-merge testing, because it occurs prior to the story being merged and staged for release.

Impact of testing: unplanned work and story carryover

Some items require iteration and rework to pass, which is unplanned work. This can cause carryover — items in the sprint backlog that weren’t completed and have to be done later.

Carryover is a cost that comes with a benefit. In the above diagram, there are six good and completed stories that were loaded, and they all ship at the end of the sprint. Compare this with post-merge release-testing end of sprint outcomes. (Fig 2.1) A single critical defect prevents the release, and zero stories ship.

Carryover is an early sign of quality process maturity — but can be disruptive to business planning and forecasting. Consistent carryover is often intolerable to the business, due to how disruptive it can be to planning. When working to reduce carryover, care should be taken to avoid a backslide on quality process maturity. Here are some recommended techniques for reducing the frequency and severity of carryover events.

Fig 8: Techniques for mitigating carryover

Risk-based estimation: Understanding and better-predicting the likelihood of rework. (e.g. riskier means larger story size.)

Yesterday’s weather: Build in some buffer capacity, adjust team forecast and sprint load to match historical actuals.

Defect prevention: Identify common root causes of defects and rework and remove them.

Regression defects

So far this article has covered techniques for preventing breakages in the software development process. Now we’ll take a look at the delivery process. An unfortunate-but-true fact is that software delivery comes with the constant risk of breaking previously-delivered software. These breakages are called regressions : unanticipated defects introduced by the delivery of something else.

In-sprint regressions

When we load a crate onto a railcar, we may sometimes damage the crate or one of the others on the car. This event is called an in-sprint regression.

In order to prevent in-sprint regressions from escaping, we’ll need to check all of our stories again before we ship. That’s right, even when we’re doing pre-merge story testing, we still need some level of release checking from earlier in the article.

FLASHBACK: Fig 4: Critical release defect

Manual release checking will still sometimes find release-blocking defects, and will still leak lower-priority defects in support of an important release date. However, the rate and severity of leakages are much lower when pre-merge testing is employed.

Delivery regressions

After the sprint is completed and the release is deployed, regressions may occur in previously-delivered stories.

Fig 10: Regressions in previously delivered stories

Keeping with the train analogy — when this sprint’s railcar bumps into the chain of previously-shipped railcars, some of the crates in any of the cars may sustain damage. After shipping, we need to re-check all the delivered items for impact.

Live defects will need to be triaged and, depending on priority, addressed immediately. High-priority production defects are very expensive and disruptive. Carryover that prevents a production defect is a proper investment of resources.

A common early approach to mitigating regressions is manual regression checking. This means going through every previous release car and checking each box for damage.

Manual regression checking is extremely time and resource intensive, so it’s often performed only periodically — perhaps as part of final release prep. Regression defects are introduced between cycles, and then hopefully identified and fixed before a major production impact.

Automated checks

Modern software delivery is optimized through the use of automated quality checks. This is akin to affixing an impact-detecting beacon to each box that we ship.

Fig 12: Stories with and without automated impact detection

While most teams agree that automated checks are a proper and necessary component of modern software delivery, not everyone agrees on when checks should be developed relative to story delivery.

Post-delivery automation

An early automation development strategy is to build checks for stories after they’ve shipped. This is often attractive to business stakeholders, because on the surface it allows the business to realize (and perhaps monetize) the value of any given user story faster.

In practice, a post-delivery automation strategy breaks down for a few reasons. Since at least one sprint’s worth of delivery exists without coverage at all times, an organization with post-delivery automation will inevitably be at risk of critical regressions occurring. Compounding this, the risk of regression does not tend to be distributed evenly. Regressions tend to cluster specifically where we are working, and we often work in the same area in back to back sprints, especially if there was carryover. Put another way — the more recently that a piece of work was delivered, the more likely it is to regress.

To identify and mitigate risks in unprotected delivered stories, the organization will inevitably need to deploy both manual and automated regression checking. This strategy consistently results in unprotected code sprint over sprint, so carrying the weight of manual checking and automated check development is unavoidable.

The continual risk of regressions in untested code and the heavy cost of maintaining both manual and automated checking processes often leads the business to question the value of developing automated regression checks at all.

Preventing delivery regressions with in-sprint automation

The next step up in automation strategy ROI is in-sprint automation — delivering automated checks within the same sprint as the user story. By automating checks prior to delivery, organizations can prevent most, if not all, critical delivery regressions.

The cost savings occur before and after initial software delivery. Upfront, development context is valuable in accelerating automation as automating a story in-sprint is consistently faster and cheaper than automating later. Downstream, release regression testing is now fully automated, as every story is delivered with automated checks. Every crate ships with an impact beacon.

Fig 14: Fully protected releases, no manual checking

A common approach is to automate in-sprint, but post-merge. This focuses the team on delivering all user-facing application changes first, and then following-up on qualitative development aspects later. This is sometimes teasingly referred to as scrumfall.

This approach is often attractive to business stakeholders as it focuses on delivering the sprint’s realizable (and perhaps monetizable) value as soon as possible. In practice, in-sprint automation is hard to maintain with a post-merge strategy.

The first challenge is in the natural business tendency to focus on the realizable user value. Automation tasks tend to wait until the end of the sprint, increasing their likelihood of carryover. When tests carry over and the user stories ship without coverage, the process quickly degrades into post-delivery automation. As outlined in the section on post-delivery automation, this can ultimately result in the organizational demise of automated testing overall.

Fig 15: In-sprint (post-merge) automation and regressions

The second challenge with post-merge automation is the risk of in-sprint regressions. The systemic interval between user story delivery and automation delivery creates a pervasive risk of in-sprint regressions. These regressions drive rework, which can cause carryover, exacerbating the first problem of carryover automation.

Preventing in-sprint regressions with pre-merge automation

As we’ve worked our way through common quality strategies, each evolutionary step has involved injecting automated tests one step earlier in the process. Shifting left from in-sprint automation, we arrive at per-story automation, occurring pre-merge.

Fig 16: Integrated pre-merge automation development

Pre-merge automation provides more return on investment than all previously-mentioned test strategies, as it provides protection and alerting throughout all downstream delivery activities. When we load stories onto the sprint car, when the car ships and bumps into others, when subsequent sprints ship and create even more cascading impact events. In-sprint and post-deploy regression-checking is no longer an expensive manually intensive process, or even a singular event. We simply go about the business of developing and delivering software complete with built-in impact beacons, and let those beacons provide regression protection.

If you need numbers on your ROI, consider the number of impact events that a given test beacon protects. By injecting the same beacon earlier in the process, for effectively the same price, we cover all in-sprint impact events. We prevent non-zero additional defects over in-sprint automation, for zero additional cost.

On the cost side, developing tests pre-merge is cheaper than all other phases of delivery, due to contextual efficiency. We best understand and can most cheaply capture how something should work when we’re in the process of building that thing. And we can include the function and placement of our impact beacon in the overall story design, which is often much cheaper and easier than retrofitting a beacon onto a rigidly complete and delivered story.

Conclusion

The most effective use of testing dollars is to include Automated Testing as part of Story Development and Definition of Done.

Modern Agile software delivery processes result in more frequent releases, increasing the risk of delivery-induced regression defects. The manual effort required to cover this risk is unreasonably expensive, and automated regression testing consistently emerges as a critical component of software delivery. Having discussed and evaluated common automated testing strategies, we have found that developing automated tests is cheapest and most effective when included as an integrated part of story work, included in the Definition of Done.