A Study in Parallelising Clojure Tests
In a language like Clojure, it is very easy to move fast and develop new features. That is one of the many reasons we have been using Clojure at Helpshift for almost ten years. As a consequence of the number of features that we were shipping, we were also adding more and more tests to our test suite, many of which were integration tests. Over time, this accumulation of tests caused the test-running time to increase, to a point that releases and developer workflows were starting to be affected.
There are many possible solutions to the problem of long-running test suites. To name a few:
- Separating out unit tests and integration tests through Leiningen test selectors and running integration tests only on a daily basis and unit tests on a per-commit basis.
- Focusing on mocked tests by pushing side-effects to the edge of the application and mocking those interfaces. For example, this can be achieved by mocking database interactions in tests, only testing the business logic and covering the verification of updates to the database via integration or system tests (which don’t run on a per-commit basis, as mentioned in the previous point).
- Using a git-diff tool that understands which namespaces have changed in the current commit and only runs tests that are for those namespaces.
- Running tests in parallel by having multiple threads run different tests at the same time inside the same JVM process.
We have followed all of these strategies at Helpshift for speeding up our test suites. In this post, we will talk about our experience with moving to a concurrent test suite which can potentially be parallelised.
Our journey to a parallel test suite is a three part story. In the first part, we will discuss our strategy to make each test and its underlying source code free of global context. Next, we will look at how we changed test utilities like
with-redefs to be thread-safe. And finally, we will talk about how to use these changes to run tests in parallel.
Step 1: Freeing the codebase of global context
An easy way to maintain connections to external systems like databases is to have a global connection defined:
Having a global connection like this, the source code utilises it in firing queries to Elasticsearch by directly referring to the connection. Thus, the connection is used implicitly deep inside the source code:
As a consequence of this, this function’s tests now have to build their own connections by modifying the original connection to a randomised index (where the test data will be stored in Elasticsearch). So the test fixture becomes something like this:
In this fixture, we have a
setup phase, in which we redefine the connection to a temporary random index. This phase is followed by the actual
test being run. Finally, we have a
teardown phase, where we restore the original connection back. When we modify the connection in the
setup phase, it is globally modified. That is, any thread can see the modified connection. If we were to now run two test namespaces in parallel, they would not be isolated any more and could introduce many points of failure (depending on how the threads interleave).
If we dig deeper into the codebase, this redefinition of the connection needs to happen because there is a function in the source code (
search-conversations) that directly uses the global connection. Therefore, for testing this source function or any other function that uses the said source function, we have to alter the connection as we showed before, which is not thread-safe. Such implicit dependency on global context can inhibit parallelisation of tests.
To avoid this implicit dependency, we can pass the “context” explicitly to the source function and this “context” can define what
elasticsearch-connection to use. In other words, instead of having implicit dependency on the global connection, we expose the context that needs to be passed to every function and this context contains all necessary information that the function might need (to communicate with the rest of the world), including the connection to Elasticsearch. This context is something that needs to be initialised once at startup and passed along in source functions. In addition, if you use the components framework, a system map can be passed as the context.
This change enables our tests to now pass any connection they want without having to worry about other threads modifying the same global connection variable.
This painstaking process of rewriting every source function to pass down the context takes us a step closer to a thread-safe test suite. The previous test fixture now takes advantage of Clojure’s
binding and is therefore thread-bound:
Note: The pattern of implicit dependency of functions on certain global variables (here
elasticsearch-connection) is a legacy pattern. Frameworks like the Component framework have mitigated the implicit dependency issue and has gained popularity over time and is widely accepted as a better alternative. To that end, the majority of the systems at Helpshift follow the Component framework. With the use of the Component framework, it would enable us to move on to the next step with no change required, but we mention it here because we at Helpshift have been writing Clojure code since 2009 and it is an important legacy codebase for which this first step had to be done.
Step 2: Finding an alternative for with-redefs and other test utilities
By far, the most common utility we have used in our tests is Clojure’s
with-redefs. Although it is a very quick way to temporarily redefine a function,
with-redefs is not thread-safe. The changes
with-redefs makes, is visible globally across threads. We can demonstrate that via the following code snippet:
If you run this, you should run into at least one failing case (indicated by the “FAIL” message on the console).
We were not able to find an existing thread-safe version of
with-redefs, so we wrote our own. The guiding principle here is to redefine the function to a new function which is
dynamic, so its implementation can safely vary across threads.
Another utility we heavily use is the spying and stubbing library Bond. Bond, along with the provision to redefine functions, provides the ability to track calls to a function in a test, which can be very helpful. Bond, internally, uses
with-redefs so it was also not thread-safe. We submitted a pull-request to support thread-safe constructs in Bond, and it follows the same principle as our implementation of
with-dynamic-redef-fns. You can check it out here: https://github.com/circleci/bond/pull/47
PS: If you are curious about the approach we took here or have better ideas, this original gist has an ongoing conversation about alternatives. Suggestions are most welcome! https://gist.github.com/mourjo/c7fc03e59eb96f8a342dfcabd350a927
Step 3: Adding the ability to run tests in parallel
The last piece of the puzzle is to actually have tests run in parallel. We used the library Eftest for this. It has a plugin which can be used for running tests in parallel in much the same way as Leiningen’s default test runner.
lein eftest :thread-safe
We used the test selector
thread-safe for identifying tests that we want to run in parallel. The usage of test selectors is also the same as Leiningen’s test runner. The Eftest plugin has a well-documented set of options, among which, the most intriguing option is the level of parallelism. Eftest provides two configurable levels of controlling how much parallelism a test suite should use:
- Namespace multi-threading: This strategy runs tests within a namespace in sequence and assigns test namespaces to different threads. This means that tests inside a namespace run one after another and different namespaces are picked up in parallel by the test runner threads.
- Var multi-threading: In this strategy, test runner threads can pick up any test that has not already been run, irrespective of which namespace it belongs to. This offers a more granular parallelism, in which if one namespace has more tests compared to others, it will not be the limiting factor in deciding the duration of the test suite.
Which parallelism strategy fits for a project depends on what kind of tests are present in the project. For example, we chose the Var multi-threading strategy for a project which has more computational tests concentrated in few namespaces and we chose the namespace multi-threading strategy for a project with more well-distributed tests across many test namespaces.
After the migration to a parallel test suite, we gathered our results on a repository with 95K+ assertions, in which the test build time improved over 50%.
The test build time improved significantly but that does not come without a cost or conditions applied.
- Running tests in parallel will impact the hardware on which the tests are running (since more work gets done per unit time); so it is possible to see errors if there are integration tests which can’t be handled (in the lines of “429 Too many requests”). More capable hardware may be required with a parallel test-suite.
- It may not always be possible to parallelise tests. We heavily rely on Clojure’s
binding-conveyor-fnto seamlessly have multiple threads see different versions of the same Var (that is, thread-local redefinitions through
binding). This only works for concurrency primitives that use this conveyor function (like agents and futures). For other concurrency primitives like
java.lang.Thread, this process of thread-local redefinitions will not work.
All things considered though, the amount of improvement we got with a parallel test suite was significant enough for us and we are in the process of moving all our major repositories to this parallel test framework.