Speeding up tests with PyTest and PostgreSQL

Published in

partoo

6 min readNov 10, 2021

Software testing at Partoo

At Partoo, all tests related to backend are “unit tests”, but to be more accurate, there’s a mix of unit and integration testing altogether within a PyTest execution environment.

Around these integration tests, there’s both the Good, the Bad and the Ugly.

The Good: all tests are designed to be runnable in full isolation and work very well when running in parallel

The Bad: to enable this full isolation, each test will instantiate its own DB on a server, replicated from a template, and will insert the test’s required data as a setUp step, and drop the database in tearDown.

The Ugly: in doing so, the connection to the database must be patched, and due to a tight binding between connection and wsgi application, the wsgi application is instantiated in every endpoint test, hence the whole repository is scanned in search of all Pyramid endpoints.

What’s worse: in our development environment, tests are run in a Docker container with volume binding to the source code, and with Docker Desktop for Mac, filesystem performances are just awful!

An important thing to note: when I joined Partoo in December 2020, the number of Python tests in the projects was a little more than 2000, by the month of May when I release my first improvements, they had increased by over 1000, so the path to improvement had to be on the core aspects and could not be on a test by test basis.

The obvious but not so easy solutions

It would be obvious to start from the worst drawbacks I mentioned, i.e. dealing with the poor filesystem performances endured by the Docker Desktop filesystem bindings.

Here are the options:

Have the project’s code directly inside the containers that run it. Unfortunately, the project is spreading into several containers so it’s not convenient to mount it in a single container.
Have the project run outside of Docker, directly on the OS, but given its large number of dependencies, having the proper set of system dependencies and binaries that would fit would be a nightmare
Get rid of Docker Desktop and go for another virtualisation system that would offer better performance. Indeed, this is a path under exploration but the project’s complexity involves a lot of points to address for that. I’ll be glad to write another article about it as soon as we reach a final solution.

FAIL = First Attempt In Learning

In a repository, there’s the code that you see and the one you don’t, because it’s hidden in files that nobody would touch, as it’s “working”. I was unaware at the beginning of the Ugly being the main performance issue I was facing, because no one had spotted it, my teammates had only hints about the Bad as it was more visible in the tests’ structure.

The first exploration and experiments I made to improve the tests was to stop instantiating a DB for each test, but rather use a single one that would be cleaned after each test.

In order to keep the parallelisation working, it would actually be one DB per Pytest worker.

For this I discovered the environment variable that is set by Pytest to know in which worker the test is running, that I could use as a suffix for the DB name:

os.getenv(‘PYTEST_XDIST_WORKER’, “main”)

Confident that it’d work, at least it did on my sample files, I ran all the tests, and found a couple hundreds tests failing, for most it would be because they were validating IDs generated by sequences that would no longer be reset, a few for reasons that are still unknown by me at the moment.

But to my greatest disappointment, I could not really detect a significant change in the execution time. That’s how I learned that our shared feeling of the culprit for slowness of the tests was wrong.

Time to make some measurements

When the first instincts are proven wrong, there’s just one thing to do (or even, if you could do it from the start, that’d probably save you some time): measure, instrument the code so it tells you where it hurts.

Also, it’s important to have some sample files where you can spot the improvements, it’d be easier than running a complete test collection. I chose one I knew well enough, very slow indeed: 90 tests running in 477s according to Pytest.

One could use pytest-profile to just get a full profiling report, but it tends to be quickly unreadable and not pointing towards the good direction, that’s why I came up with a small profiler of my own that I could use as method decorator or as context manager to wrap blocks of code.

And now, after instrumentation of the code, I discovered the Ugly. Yes, building the WSGI app being done at setUp was actually taking 3s on average, and even if done less often that instantiating databases, it had to be dealt with!

Getting rid of useless coupling

The remediation to that was to untie the knot that was binding for no (longer) reason the WSGI app constructor to the DB sessionmaker.

Now the request will properly get a clean DB session without even knowing it's been hijacked by testing framework.

Faster WSGI bootstrap

Another small improvement along repair of was to reduce the scope of the filesystem scan to avoid scanning test folders for nothing. It indeed is useless to scan the tests folders

config.scan(“web”, ignore=re.compile(“tests?$”).search)

Time to make a small check on our sample file: the 90 tests now run in 184s!

That’s encouraging, but let’s dig even deeper.

Shared Datasets

By construct, many tests are grouped in classes (or Test Suite) that will share the same dataset, which is for now inserted at every setUp. And I realised with measurements that the time to insert a medium set of data was already quite time consuming, up to almost the same time it takes to instantiate a new DB from template, i.e. 1s. After my failed attempt to reduce the number of DB creation, maybe there’d be a card to play around insertion of the datasets.

PostgreSQL is nice enough to copy the data inside tables when you create a DB from a template, with no significant overhead. Test Suites could therefore prepare their own template DB with data inside, and then at setUp time, they’d just need to instantiate from this preset template instead of the virgin template.

Before:

Template -> N times (New DB + Insert Data)

Template -> New DB + Insert Data -> N times New DB

It globally starts being interesting when there are at least 3 tests in suite (even 2 if the dataset is large enough).

Now, let’s come back to my sample file: 90 tests running in 128s!

Still making good progress, yay!

Read-only tests

While presenting preliminary results to my coworkers, I’ve been asked if it was possible to detect read-only tests and for them there would be no need to instantiate their own DB as it’d not impact the contents of the DB and the queries could be performed directly on the preset template DB.

Well, automatically guessing when tests are read-only would be close to impossible, but I guess providing a decorator would do. But how to implement it and make sure failure in a test is not due to a so-called read-only test elsewhere not behaving?

That’s another nice thing provided by PostgreSQL: permissions are also inherited when creating a template.

Let’s create a user that has only read access to all tables of the DB:

CREATE USER partoo_test_ro;GRANT SELECT ON ALL TABLES IN SCHEMA public TO partoo_test_ro;

All done, or almost!
Well, our API tests rely on API keys, which are logging the timestamp of last usage. But we’re not testing API keys, right? I mean there’s no assertion that a key has been used at a specific time… Indeed, so how can make sure that API keys can be used in read-only tests, and there’s a nice trick, you can grant update on specific columns of a table

GRANT UPDATE last_used_at ON api_keys TO partoo_test_ro;

Now we’re all set! Time for a new measure, out of 90 tests in my sample file, 56 could be flagged as read-only, and now, 90 tests running in 87s.

Some figures

Tests are really slow on my Mac, and I’m limiting the Docker Desktop VM resources so I wondered if my fellow coworkers with Linux workstations would benefit from my changes.

So I asked one if he could compare a full run with maximum parallelisation he could allow (12 workers) with and without my changes.

The whole test suite went from 1449.13s to run to 967.41s, so around a gain of 33%, not to mention that not all tests have been reworked to be flagged as read-only.

Conclusion

Always take good care of your tests, as they are the best safety net for your production. And that means being able to run them often, even in development environments, so performance matters as much as your production code.