How to speed-up Django tests without actually optimizing the code

Michal Dabski
7 min readSep 20, 2020

A guide for those who prefer to throw hardware at the problem.

I think we’ve all been there: A large project, big test suite, long waiting times for the pipeline to complete. Worse yet if some tests fail and you have to re-run them all over again. A good engineer will take a good look at the test suite, profile, and optimize it. Maybe clean out some unneeded or redundant tests. However, this is a time-consuming endeavor that will likely yield relatively small returns — depending how bad the state of the project is to begin with. In the end, as the project grows, you’ll want to keep the test coverage at adequate levels and make sure all those pesky regressions fixed over time do not resurface. This problem not only increases the time to deploy the app, but slows down turn-around time for merge requests by delaying feedback from failed tests.

At MEG Support Tools we have a test suite that takes about 10–12 minutes which is no so bad considering that 30 minute pipelines are not unheard of in other projects. Yet still, we can do better than that! We currently have 10 test runners spread across 3 machines, where only one of the runners actually runs the test suite, and the others will have finished their assigned task much earlier. The solution: what if we distribute the test suite across multiple runners to make use of those idle runners?

All tests take between 10 to 12 minutes to complete

There are already some solutions that allow running tests in a distributed way; Django itself provides parallel test runner, but it’s limited to a single machine, and won’t take into account machine load — in our case a machine can have 4 runners (one runner per CPU core), and if each of those runners executes parallelized tests, the machine will slow down to a crawl and there may be no speed improvement at all. Another option for PyTest users is xdist plugin which I have not tried myself as we don’t use PyTest. Ideally for me the tests will be divided into individual jobs recognized by CI so they can be properly assigned to runners based on runner’s capacity.

Original pipeline comprises mainly short (< 1 min) tasks and a long-running test task (> 10 mins)

The easiest way of splitting tests in Django is by using tags, but you can probably imagine that manually going through all tests and grouping them by tags is an unnecessarily laborious task, and Django doesn’t appear to have the ability of running only un-tagged tests, which is a big no-no as I would like to make sure that every test is ran in the end, even if developer forgot to tag it. In addition, this approach does not scale well as test suite will become slower and the only way to divide the tests again will be to re-tag everything and create the new job manually.

Luckily for us, GitLab CI jobs have a parallel setting that allows us to specify how many test jobs to run in parallel just by changing a single value. Now, the only problem is, we don’t want to run all tests N times. We want to run all tests, but ideally divide them in N groups and run each group separately in a different instance of the job. GitLab provides some useful environment variables for parallel jobs that can be used for that: CI_NODE_TOTAL and CI_NODE_INDEX — now, how can we use these to control which tests run in which job?

Django or unittest cannot automatically divide tests into groups for us. We have to implement it by writing a custom TestLoader class and overriding discover() method:

Implementation of the partial TestLoader class and the TestRunner

You will notice that in the above example that I am sorting the tests alphabetically before dividing them into groups — this is probably not needed, given that the tests already run in a pre-built docker image, so the environment in every runner will be identical, and so the order of tests should be consistent. However, I’d like the tests to be reproducible outside of docker container — for instance on developer’s machine the OS and environment can be slightly different and (correct me if I’m wrong) affect the order of tests, which in turn will result in a completely different set of tests for given input — this is unacceptable.

The tests are divided into groups by reading GitLab’s environment variables — I am using django-getenv for this — you can of course use the built-in os.getenv(), but keep in mind that you’ll have to convert the value to integer.

Lastly, the tests are sliced into N groups, and every Nth test starting at given index is returned. This is simpler to implement than dividing the tests into n blocks of adjacent tests and less prone to off-by-one errors. Note that we’re subtracting 1 from CI_NODE_INDEX as lists in Python start at 0 while gitlab provides a value between 1 and N (inclusive).

After updating TEST_RUNNER in settings.py, and setting “parallel: 4” in .gitlab-ci.yml, you will see that now we have 4 test jobs instead of one, and status of individual jobs can be viewed by clicking on the group:

New pipeline runs 4 instances of test job at the same time

Each job now handles unique set of tests. Note that currently GitLab runners accept jobs greedily, and all jobs are likely to be ran on the same machine if it has that many empty slots until this issue issue is addressed. Nevertheless, this allows tests to be spread across multiple runners, and we can get test result as soon as the longest running job completes, which is about 4 minutes even if all runners are on the same machine:

Each test in the pipeline takes at most 4 minutes now

Of course, you can tweak this number to your liking, but don’t be tempted to crank it up to 11. You should definitely keep this number lower than your number of runners, taking into account other jobs that need to execute at the same time, and bear in mind that each job has an overhead of pulling the project, building it (or pulling the docker image), running the test command, importing and discovering tests etc.

Overhead is the total difference between the total and baseline (single test job). Overhead per-job is less than 1 minute.

To select the optimal number of test jobs, I ran the pipeline with the test suite split into 1 to 20 jobs, and input each test run-time into a spreadsheet. Each pipeline was ran only once, but it appears to be enough to give us a clear trend. The average value would be the ideal benchmark, but in reality we want to optimize for pipeline finish time, which depends on the longest running job — the Max value.

As expected, overall time of running all tests grows with the N value, showing that overhead of each job is quite significant, in particular with 8 or more jobs. This means, that if number of jobs is too large, it will occupy runners for longer in total, while not really making a dent in our target value — Max.

In the end I decided to stick with the conservative value of 4 jobs to avoid clogging up the CI with the work-load. My tests showed that regardless of number of jobs, there is always a group that takes 4 minutes. This issue is specific to my test suite and helped me identify certain long-running test cases that can be further addressed, or broken into smaller test cases to bring the Max value closer to Average.

To sum up, I think that the advantages of running tests in parallel are quite clear. However it’s important to keep in mind that you can only keep adding jobs as long as your infrastructure allows, and if jobs already have to wait in queue for execution, this may make it worse as the overhead adds additional execution time. Lastly, if you use coverage during your test run, you will discover that your test coverage has now fallen way below 100%. The only way to compute coverage may be to run the full test suite periodically, unless there’s a way to combine coverage data into a single report.

Advantages

  • Better utilization of CI infrastructure
  • Test time can be reduced by adding more runners and splitting tests into more groups
  • Shorter wait time for pipeline completion
  • Failed test jobs can be re-ran much faster

Disadvantages

  • Computing code test coverage is made much more difficult
  • Overhead introduced by additional jobs can introduce additional delays if CI infrastructure is over-utilized

--

--

Michal Dabski

Michał is the CTO for a med-tech startup company MEG Support Tools