Optimizing CI/CD Processes with Selective Testing

Published in

Agoda Engineering & Design

7 min readMar 21, 2024

Developers often find themselves slowed down by lengthy CI processes, particularly waiting for UI/Integration tests to verify their contributions. This bottleneck can significantly delay the merging of new features or fixes. This article explores a streamlined approach to improving CI/CD pipelines through selective testing, enhancing efficiency while maintaining the quality and integrity of your testing process.

The Problem: Slow CI with Comprehensive UI Testing

In Agoda’s mobile apps, the UI tests are heavy as we need to instantiate a lot of libraries, such as experimentation, analytics, core APIs, etc., to set up a test app. As a result, we settled on using a single test application to conduct nearly 900 UI tests that spanned various features.

To initialize a test application, it was necessary to configure numerous utilities before executing any tests. This became a huge blocker from modularizing the test application into feature-specific test apps. As a result, developers became accustomed to incorporating their feature tests into this singular application.

This practice came with a high price. Identifying which tests depended on which features became increasingly challenging, leading us to execute all UI tests in the Continuous Integration (CI) system for every change to ensure nothing was broken.

There is nothing wrong with this approach in an ideal world, but we are not in one, are we?

This test execution strategy, while thorough, introduced several challenges:

High Test Flakiness: Random failures in tests often block pull requests, necessitating multiple retries for successful merges. Developers had to retry at least three times on average to merge their changes.
Delayed Time to Market: The average time to ship changes to production was considerably high, exceeding 2.5 days.
Negative Developer Experience: Developers frequently encountered test failures unrelated to their changes, negatively impacting their experience.

The Solution: Implementing Selective Testing

To solve these problems, we had to approach them from a developer’s point of view. A typical workflow on a developer’s machine looks somewhat like the one below.

Writing code and tests for their features.
Running specific tests that they believe are impacted by those changes.
Pushing it to Git and trying to merge those changes.

Executing all the UI tests on a developer’s machine is impractical, given the high execution time and the complex environment setup required, such as external mock services, emulators, etc. Therefore, developers usually run only those specific tests on their machine that they consider to be affected by their code changes.

This made us question if we could do the same on the CI.

The main challenge was identifying the tests affected by the changes in a pull request. To address this, we needed to determine which code segments are executed by each test. We all know a tool that captures this exact metadata. Yes, you’re right; it’s a code coverage tool. A code coverage tool captures all the codes hit while executing tests and logs that information in a file.

You might be familiar with tools like JaCoCo, SonarQube, NCover, and JSCoverage, each with its unique pros and cons and the language they support. We opted for JaCoCo, which was already used to report code coverage for unit tests in our Android repository. Additionally, parsing the generated exec files with JaCoCo proved quite straightforward.

How Does Code Coverage Work in Android UI Tests?

To enable coverage for UI tests, we need to set just one property in Gradle, as shown below.


android {
   buildTypes {
      debug {
         testCoverageEnabled = true
      }
   }
}

When we run the connected tests in our project, the instrumentation runner utilizes the Emma coverage tool to capture every bytecode executed within the application.
These bytecodes are then written to an exec file within the application’s device storage. Once the test run is complete, the instrumentation runner pulls the exec file from the device storage and stores it under the module’s build directory.

The JaCoCo library, usually a Gradle task, parses this exec file and creates a report in either HTML or XML format based on the configuration.

Challenges

The coverage file(.ec) generated here is based on the tests’ collective results. It’s impossible to verify which test executed which lines of code from this single exec file. So, we need to generate coverage files for each test separately by running one test at a time. This involves an overhead time of clearing previous data, launching the app, running a test, and extracting coverage data from the device for every test.

Agoda’s Device Farm to the Rescue

The Agoda Android application encompasses over 980 UI tests, managed through a device farm with 60 Android devices. We orchestrate these tests on our device farm, which takes at most 15 minutes to execute all tests. We execute these tests in batches of size 10 to ensure no tests are affected by some global state.

To generate one coverage file for each test, we configured the tests to execute in batches of size one and extracted the coverage file at the end of each batch, basically every test.

This configuration required 50 minutes on average to run all the tests and extract respective coverage files, which is more than 35 minutes of overhead time. At the end of this job, we got artifacts like the one below, with coverage files associated with each test separately.

UI Test orchestration in Agoda Device Farm

Working with Coverage Files

The hard part is over now. We have separate coverage files for each test. It’s very easy to parse these coverage files using JaCoCO library’s built-in helpers.

/**
* Get the lines of codes from the coverage file
* [
* com/packageName/className$MethodName1,
* com/packageName/className$MethodName2
* ]
*/
fun parseToLines(codeCoverageFile: File): List<String> {
val loader = ExecFileLoader().also {
it.load(codeCoverageFile)
}
return loader.executionDataStore.contents.map {
it.name
}
}

Once we parse all the coverage files, we can create the coverage data as shown below, mapping every line of code with the tests that executed it.

// CoverageData: Map of list of tests related to each line of code.

[
com/packageName/className$MethodName1 : [test1],
com/packageName/className$MethodName2 : [test1, test2]
]

Now, we use this coverage data to compare changes in a pull request, decide which tests are affected, and only execute those to merge the changes into the main branch.

Generating Coverage Data for Selective Testing

A pivotal question arises: When should we generate this coverage data? Generating it with every pull request would defeat our objective of enhancing efficiency. Instead, we adopt a strategy akin to common caching mechanisms.

In most build systems, the main branch produces a cache, and every feature branch uses this cache to avoid building redundant things.

We generate these coverage data from the main branch and store it in the cloud, making it accessible for the CI job. The test selector downloads this coverage data on every pull request, selects only those tests affected by the pull request changes, and sends the list of selected tests to run in the Test job.

Escaping Tests

It takes almost an hour to generate coverage data for 900 UI tests. Imagine, during this interval, two pull requests (PR1 and PR2) are in the process of being merged. If PR1 adds a new dependency to testA, then the TestSelector for PR2 will not verify the testA based on the old coverage data.

This might make the testA fail, especially when the subsequent PR changes something on the newly added dependency of testA. This failing test will go unnoticed until some PR changes something in one of the dependencies of testA, and the test selector selects testA once new updated coverage data is available.

Timeline explaining coverage data and escaping tests

Maintaining the Test Quality

To tackle this, we built a solution inspired by Gradle’s test caching approach. A test will only be cached if it’s successful.

We only upload the coverage data by parsing only successful tests and ignoring failed ones. All pull requests will be blocked once new coverage data is uploaded from the main branch. Since there won’t be any coverage data for failing tests, the test selector considers those tests new tests and always runs them in the CI pipeline regardless of the PR changes.

Test selector selects the missing tests in coverage data

This enables developers to catch the failures early. On top of this, these failing tests are notified to the test owners through an automated system, awaiting their response on tackling this failure on the main branch. They can either;

Fix this failure in an ad-hoc process if it is critical. Or
Mark the failing tests to be ignored on the Test Selector if they are flaky to unblock other Pull requests from merging and fix them in their SLAs.

Selective Testing Results

Selective testing is currently implemented on Agoda’s Android repository, leading to a noticeable reduction in the number of tests run per pull request. This strategic approach has saved time and resources and significantly reduced the frequency of CI pipeline failures, particularly those stemming from unrelated flaky tests.

Conclusion

Selective testing has proven to be an effective approach to optimizing CI/CD efficiency. This method facilitates faster integration and development cycles and ensures the maintenance of high-quality testing standards. Inspired by these results from our Android repository, we plan to explore ways to implement selective testing on other repositories.