Why isn’t all test automation run on the pipeline?

Michael Leonard
Slalom Build
Published in
8 min readMar 15, 2019

“All prayers are answered, and prayers for understanding are answered the fastest.” — Deepak Chopra

Ever since I read How Google Tests Software, it has been clear to me that automated tests should run on the CI/CD pipeline. This is because software product owners need software that not only solves a problem now, but also can be improved continually. Years have gone by since I reached this conclusion and unit tests are now run as part of any CI/CD pipeline, but API, UI, integration, module and other types of automated tests are often not included. Why not? It boils down to three reasons: value, architecture, and feasibility. Those very same reasons make an argument in favor of pipeline automation. Let’s frame this argument as a series of questions and dig into the value question first.

Value: Why should all test automation run on the pipeline?

Because of benefits such as:

  • Coding errors are caught at their source when they are cheapest to fix.
  • Developer productivity is highest when interruptions are minimized.

While these benefits are appealing, people still think that automation on the pipeline costs too much. This is a myth. In fact, ongoing computing costs are low, and initial setup costs, which depend mainly on clever use of tools, are low and are amortized over the life of the project. This analysis of costs is borne out in the case studies discussed below.

Even when the value proposition is accepted, tests often do not run on the pipeline due to the architecture of the system.

Architecture: Is the system architected to run test automation on the pipeline?

This depends on whether the system is properly factored into modules. Since a module represents an encapsulated set of functions, it can be run in process with external dependencies doubled, e.g., stubbed out. Whether UI, API, or other type of module, the body of work can be tested independently. Therefore, passing module tests propel a merge to go ahead. Failing tests isolate an issue in the proposed merge before it goes ahead.

On the other hand, a site represents a deployed set of modules with real external dependencies connected instead of doubled. Given that all of the modules are tested independently, the only tests that should run against a site are integration tests that prove the modules integrate together with their external interfaces. It also doesn’t make sense to run them before merging — Instead, site integration tests are run after deployment. Together with manual exploratory testing, passing site integration tests propel the site’s code to move ahead to the next higher environment. Since tests are normally passing, failing tests isolate an issue to the small number of merges from the last deployment to the site.

A good way to organize modules is one per code repository. Work on a module takes place in a feature branch, and most tests run at a feature branch level. After a feature branch is merged into the develop branch, the changes are ready for integration testing in a deployed site. Thus CI/CD proceeds.

Even when the architecture is correct, tests often do not run on the pipeline due to feasibility.

Feasibility: Is it feasible to run all test automation on the pipeline?

The answer to this question comes down to whether the module contract can be called and contracts to its dependencies are doubled. For example,

  • UI Contract: Can the test runner run a browser that drives UI tests?
  • API Contract: Is the module running such that the test runner can call API endpoints?
  • External dependencies: Are all of the dependencies doubled in some way, e.g., mocked?

If the answer to any of these questions is no, then the module cannot be tested independently.

So, when can all tests run on the pipeline and what is the impact if they don’t?

If running the tests is feasible, the value proposition is accepted, and the architecture is correct, then the tests are run on the pipeline and they become almost invisible. Instead of the constant scramble by testers to keep up with development, developers and testers work together to produce and maintain module tests that are always passing. Instead of ongoing struggles to repair broken tests, tester time is freed for exploratory testing and test framework development. Instead of questionable completeness of regression testing, code is propelled forward with confidence.

If tests exist but are not running on the pipeline, the downstream turbulence makes it very difficult to get them running. Testers are constantly scrambling to fix broken tests. Developers don’t own test maintenance. Instead of a “you break it, you fix it” mentality, there is “throw it over the wall and see what happens.” When broken tests or product bugs are brought to developer attention, the developer is dragged back to an old problem and loses forward momentum. To avoid this context switching, it is much better to start a project with all tests running on the pipeline. However, if you are in a situation in which not all tests are being run on the pipeline, then what can you do about it? Even partial remedies are valuable, and the experience will position you to make better decisions on future projects. The following case studies show what we can learn about this from real-world examples.

Case Studies

The following examples are drawn from my client work over the past five years. Enough technical detail is given to make the point clear and illustrate some useful techniques, but all client-specific information is omitted.

Case 1: API tests are not being run on the pipeline. Instead, they are run on laptops in a haphazard way by testers and developers. Test failures either interrupt the flow of testers away from exploratory testing to fix tests, or interrupt the flow of developers away from new product development to fix product bugs.

Analysis: Architecture is good: the tests can be run against the API contract and external data sources are stubbed out. However, the value of running the tests on the pipeline is not clearly understood and feasibility is absent because there is no way to execute tests against the API contract in the test runner process.

Response: This type of feasibility issue is common for API and UI module tests because the module must be run in a separate web server process. The complexity creates enough difficulty that the tests never get into the pipeline. There is often a creative solution for such problems. In this case, the API module is a Java Spring Boot module. Spring Boot provides a test runner class SpringRunner that can run the module in the test process. Unfortunately this class cannot be used directly in this case because the tests are already using a different test runner class, DataProviderRunner. We addressed the feasibility issue by developing a combined Spring Boot / Data Runner class and using it with the @SpringBootTest annotation. After that, we inserted the tests into the pipeline. Through a creative solution the value was realized as the tests became invisible and caught errors at their source. Ongoing incremental cost was $0 since existing pipeline runners had excess capacity. The setup cost was about a day of creative work.

Case 2: UI site tests are the only tests being developed. There are no unit, module, or API tests. Tests are not being run on the pipeline. Instead, they are run on laptops in a haphazard way by testers. Test failures are erratic depending on 1) the laptop running the tests and 2) the status of the site against which the tests are run.

Analysis: Architecture is not well-factored. There are no clear independent modules. It is not feasible to run the UI tests in the pipeline because the test runner servers do not have browsers installed on them and browsers cannot be installed by policy. The value of running the tests on the pipeline is appreciated but feasibility and architecture are in the way.

Response: Install Jenkins on a VM that is running Chrome. Set up triggered and scheduled jobs that run the tests. Harden tests to deal with varying conditions of the target sites. Although architecture is not addressed, feasibility is partially addressed adding a lot of value. The cost was just the ongoing cost of a single VM plus about a day of setup effort.

Case 3: API site tests are not being run on the pipeline. Consequently they are only run on laptops against a deployed site.

Analysis: Although the target modules are well-architected, they depend on an external database that is perceived to be difficult to fake or otherwise double.

Response: Create a Hibernate ORM in-memory database automatically with just three lines of configuration. Running in memory provides a trivial fake database solution. Now turn the site tests into module tests and add them to the pipeline. Even though the in-memory database is not exactly the same as its persistent counterpart, it still provides a tremendous amount of value by enabling the module to be tested independently. Incremental ongoing cost is $0 because existing pipeline runners had excess capacity. Initial setup took about a day of creative problem-solving.

Case 4: A desktop application has been built over many years. UI site tests against the whole application are in place on the pipeline. However, the application cannot be changed in any significant way because any change invariably causes subtle breakage across the application.

Analysis: The application has a monolithic multi-threaded architecture. There are no modules. It has grown too big to extend without significant refactoring.

Response: Refactor the application into storage, business logic, background, and UI modules. Use the UI site tests to guarantee overall correctness as the application is refactored. Implement module tests, which also run on the pipeline, to guarantee the independence of each module. Even though refactoring is difficult, tests provide high confidence and risk is managed. Now the application can be extended with confidence. Incremental ongoing cost is $0 because the pipeline runners had excess capacity. Initial setup required about two months of refactoring work. The setup work was now amortized over the lifespan of the product.

These cases show the relative difficulty of getting tests into the pipeline. It’s not hard to get acceptance of the value proposition, assuming that feasibility and architecture are in place. If necessary, progress on feasibility can be made by harvesting low-hanging fruit such as the combined test harness illustrated in Case 1, Jenkins on a VM as in Case 2, or an in-memory fake database as in Case 3. This requires persistence and creativity because the low-hanging fruit is not obvious. The hardest thing to address is architecture, which can be addressed through refactoring. Business generally favors feature development vs. refactors but refactoring is often the more cost effective approach long term. After refactoring, feature development accelerates as productivity gains kick in. This is hard work, but well worth it as the system becomes extendable and maintainable as in Case 4.

Experience shows that the best way to get tests into the pipeline is to have all of them in the pipeline from the beginning. This encourages good architecture and smart decisions about tooling for feasibility. Then the value is realized as defects are caught at their source and development is propelled ahead smoothly. Why isn’t all test automation run on the pipeline? For those of you reading this article it’s only a matter of time!

--

--

Michael Leonard
Slalom Build

I’m an Architect in Slalom Build Quality Engineering. Opinions expressed are mine.