Reproducing Most (Deterministic) CI Failures

Jeff Gaston
2 min readNov 14, 2023

--

Fixing a build failure is often easier if you can reproduce it on demand.

Most failures are easy to reproduce; they give you a short error message and you can just rerun the action that failed: ./gradlew :someproject:sometask if you’re using Gradle.

If, however, this doesn’t reproduce the original error, you may want to minimize the differences between the original failed build and your subsequent test build:

Try running the same command. You can make it easy to run the same command by putting that command in a script in your source repository.

Try using consistent tools. You can make it easier to have the same tools everywhere either by moving these tools into source control or by telling the build system to fetch and verify specific versions of dependencies when they’re needed.

Try keeping any caches consistent. Empty caches are usually the easiest to restore — just delete them (this is easier if the artifacts and caches are all under one directory). If a build fails with nonempty caches or other leftover state, you could also attempt to restore them by running the same preceding builds in order. In AndroidX, our release builds start with an empty build directory and don’t use a remote cache, to reduce the chance of being affected by leftover state.

If you ran the same build command with the same code, environment, and caches, and got a different result, it’s likely that the error is nondeterministic. One way to learn about a nondeterministic error is to look for more examples of the failure.

If you find that this process successfully reproduces the error but it takes too long to test, we’ll talk more about that in a later post too.

--

--