Testing Postgres applications: one simple trick
In this article we provide a simple trick that we use at Geoblink
which makes it easier to test a Postgres application.
Testing and mocking
In general, a system is built out of a lot of components: some frontend that the users interact with, several backends, several stores… One could test such a system as a whole, i.e. an E2E test, but in general it is better to test the different parts in isolation and for that, one has to mock the interfaces.
Let’s focus on a Postgres application: a server written in any language (NodeJS, Python, Go…) which connects to a Postgres database. A first instinct would be to mock the answer from Postgres. However, it is far from optimal, queries can get complicated and we have hidden logic in them, which has to be tested. Another option would be to seek an in-memory version of Postgres, but that isn’t good either, there is no guarantee that results will be the same, and of course some functionality might be lacking.
What we need to achieve is that Postgres reads the data that we want from the tables that we query. That way we can modify the data as we wish with the knowledge of what the output should be. That is exactly what we explain in this article.
A simple trick
The strategy to mock tables consists of three steps:
- Make sure that there is exactly one connection from the application to the DB. If we are doing pooling at the application layer that will be easy, we just need to stub our pool to always return the same client. If we are pooling outside the application, for example with PgBouncer we need to ensure that PgBouncer has exactly one connection and that connection does not close.
- We create a temporary table with the same name as the table that we want
to test. For that we can do:
CREATE TEMPORARY TABLE table_to_test LIKE table_to_test;
3. Now we insert fake data to the table.
How it works
After we have built the temporary table, any query like:
SELECT * FROM table_to_test;
will return data from that fake table that we created. The reason is the search path. The search path is analogous to $PATH in Unix, it determines in what order the schemas have to be searched for the name of a table. By default, Postgres sets the temporary schema earlier in the search path, so any query will hit first our temporary table.
The Good Parts
- Optional constraints. When testing it might be beneficial not to have the same constraints as the original table. Consider the case that we are testing a table of users. It might be the case that we have constraints on the birth date so that age is not null or not negative. However, in a particular test we might be testing logic related to addresses. Having to set a proper birth date would be coupling independent parts. Note that, we can actually opt-in for constraints just by doing:
CREATE TEMPORARY TABLE table_to_test LIKE table_to_test WITH CONSTRAINTS;
and the same applies to FK or triggers.
2. It’s possible to avoid previous data. This specially important when data comes from an ETL. Suppose an application that has to count the number of restaurants inside postal codes. Most probably, there will be a table that contains the postal codes. Possibly a fraction of them, but definitely real data. If one was to test the number of restaurants by postal code he would require some knowledge on the actual location of the postal codes. However, with this trick it is possible to overlay the table and create postal codes that have very simple geometries, and where the count of the restaurants can be known beforehand.
3. It requires less permissions. This is partially related to the previous point. It could happen that the DB user has read-only access to some table. That would make it impossible to mock data in the real table. However, with this trick it only needs permission to create temporary tables.
4. Easier to parallelize. Temporary tables depend on sessions. If tests were ran with different sessions it would be possible for the tests to read different data without clashing. Writing data to the real table wouldn’t work because different sessions would be able to see the data.
5. Easier to clean-up. As we mentioned in the previous point, temporary tables depend on the session. That means that even if we screw up our cleaning scripts, the moment we close the connection the data will be cleaned.
The bad parts
1. Fully qualified queries. The reason the trick works is precedence in the search path. However, this is not applied if we use fully qualified names. That is, if we query:
SELECT * FROM real_schema.table_to_test;
Postgres will return the data from the real table, ignoring any other table with the same name.
2. It does not work with views. If there is a view created in the db as:
CREATE VIEW some_view_to_test AS SELECT * FROM table_to_test;
It will be linked to the actual table in the DB, so even if we create a temporary table the result will be returned from the real data.
Temporary tables allow to mock data so that one can check that queries fetch exactly the result that is expected.