Faking Redis/PostgreSQL connections while testing with Pytest in Python

Szymon Palucha
Analytics Vidhya
Published in
7 min readJan 2, 2020

I recently had to write some Python unit tests for functions that use Redis and PostgreSQL databases to store data. To achieve this I had to learn how to use Pytest and packages which would be able to fake a database connection. The latter proved particularly challenging due to the lack of good documentation and useful examples online. I spent a lot of time googling and figuring out how to make these fake connections in tests and setting up a test framework. As a result I wanted to share what I did to help anyone else looking to write similar tests save time. I’ll first do an example of a fake Redis connection and then I’ll look at a fake PostgreSQL connection example. I’ll assume some knowledge of Pytest (I got started with this useful article https://medium.com/testcult/intro-to-test-framework-pytest-5b1ce4d011ae) and Redis/PostgreSQL databases. For all the code including the test files please see https://github.com/paluchasz/Pytest-testing.

Redis example

For the Redis example let’s assume that we want to test a function which sums the values of certain Redis keys. A simple example of such a function could be

def sum_redis_key_values(rc, key_format):
total = 0
for redis_key in rc.scan_iter(key_format.format(i="*")):
total += int(rc.get(redis_key))
return total

In this example I chose the keys to be: key:0, key:1, key:2, key:3 (so key_format = “key:{i}") and their values to be integers.

Let’s say that we want to test this function for three different test cases and give a pass/fail for each test case. This can be achieved easily by using Pytest’s parametrize decorator with a function which returns a list of tuples. For instance, putting the below code in a file (which tests Python’s sum() method) and running $ python3 -m pytest filename.py -v

import pytestdef get_sum_test_data():
return [([3, -2, 6, -8], -1), ([20, 0, -30, 40], 30), ([2, 4, 1, 5], 12)]


@pytest.mark.parametrize('nums, result', get_sum_test_data())
def test_sum(nums, result):
assert sum(nums) == result

will result in three PASS statements, one for each test case. Changing the number 30 to some other number will result in PASS/FAIL/PASS instead.

To actually fake a Redis connection I used the birdisle package (02/2021 update: if you are using Mac OS I suggest using the redislite package instead, see the end of article for explanation). From the documentation

“Birdisle (an anagram of “lib redis”) is a modified version of redis that runs as a library inside another process. The primary aim is to simplify unit testing by providing a way to run tests against what appears to be a redis server, but without the hassle of starting a separate process and ensuring that it is torn down correctly.”

To use it one can simply do the following

from birdisle import redisrc = redis.StrictRedis(decode_responses=True)

Normally you would need to pass in host = "localhost” and port = 6379 if you were connecting to you local Redis database. If you compare this connection object rc to the one you get from the standard Redis package you can see that this one is a slightly different LocalSocketConnection. Now you can use any Redis command from the standard redis package. I like to keep a global rc(redis client) variable so the connection can be easily accessed anywhere in the file.

Now we require two further functions. One will create the data and one will clear the data in the fake Redis connection after each test case. You can check for yourself that the new keys are not actually added to your local Redis. Below is the full test file.

Full test file that uses a fake Redis connection

Let’s summarise what it does. When you run $ python3 -m pytest test_faking_redis.py -v :

  • Pytest first identifies which functions in the file have a name beginning or ending with the word “test”. In our example we only have one such function called test_sum_keys().
  • The load_data_and_connect() function is then called from Pytest’s parametrize decorator.
  • Inside it a fake Redis connection is made if one does not already exist.
  • All the data is loaded and turned into a list of tuples like in the Pytest example shown earlier. (As I only used three test cases I stored each test case in a separate json file but there might be better ways to do this).
  • Once this function is executed Pytest then runs the test function with each tuple in turn as input for each test case.
  • For each test case fake Redis data is created, the function that we are testing sum_redis_key_values() is executed, Redis data is cleared and we assert if the sum is equal to the expected result.
  • Pytest logs out a PASS/FAIL for the assertion and continues to the next test case.

PostgreSQL example

Faking a PostgreSQL connection turned out to be a bit more challenging as expected. Firstly, a little reminder of how to connect and use psql within Python. The package I use for that is PyGreSQL. To make a connection we could do the following

import pgdbCONNECTION = pgdb.connect(user=info['user'], host=info['host'], database=info['database'], port=info['port'])

where info is a dictionary to be specified, which could be loaded from a config file. (Note I think there is another pgdb Python package and so you need to install the correct one otherwise you might get an error like “pgdb has no connect method”)

To then query the database we could do something like

cursor = CONNECTION.cursor()    
records = cursor.execute("""SELECT age FROM students""").fetchall()
cursor.close()

Now to make a fake psql connection I used the testing.postgresql package (https://pypi.org/project/testing.postgresql/). Note that this package requires a local PostgresSQL database installed (as opposed to the birdisle package which I believe doesn’t require the Redis database). The package provides a fake database/ user/ host/ port which can be used to make a fake connection with the PyGreSQL package as follows

import pgdb
import testing.postgresql
psql = testing.postgresql.Postgresql()
info = psql.dsn()
CONNECTION = pgdb.connect(user=info['user'], host=info['host'], database=info['database'], port=info['port'])

As an example let’s say we want to test a function which sums the ages for all students from the “students” relation (table) in psql. The function could look like

def sum_ages():
records = query_database(operation="""SELECT age FROM students""")
ages = [r.age for r in records]
return sum(ages)

where query_database() is another function which does all the cursor stuff shown previously.

To create the students relation in the fake psql database the Pandas package turns out to be very useful. It has a handy to_sql() method which converts a Pandas dataframe into a psql relation. It requires an engine object to be passed in which tells Pandas which psql database to insert into. To get this object we need to use yet another package called sqlalchemy. It makes sense to save the data needed to make the students table as a csv file so it can be easily loaded into a Pandas dataframe with the read_csv() method. Putting this all together we need to do something like

import testing.postgresql
import pandas as pd
from sqlalchemy import create_engine
psql = testing.postgresql.Postgresql()
engine = create_engine(psql.url())
students_df = pd.read_csv(file)
students_df.to_sql('students', engine, if_exists='replace')

The if_exists = “replace" tells Pandas to replace an existing relation with the same name. There are other useful things that can be passed in to the to_sql() method like dtype which specifies the data type for the columns.

When dealing with psql connections we need to rollback the connection after each test case (rollback is used to undo transactions and it seems easier than trying to clear the database — I also ran into some problems without doing this) and close the connection after the last test case. As a result we also need to keep track of the total test cases and the test case ids. Below is the full test file

Full test file that uses a fake PostgreSQL connection

Note that we need to do things slightly differently to the Redis example and fake the connection in each test function rather than the load()function which gets called in the parametrize decorator. The old approach would still work here but the problem occurs if we had more than one test function in this file which used the same parametrisation. If there were multiple functions Pytest first locates all of them and executes each parametrize decorator in turn before running all the tests. Hence, our connection objects would be overwritten each time the load function gets called. (I found this problem when running Pytest with multiple test files at once; they were passing individually but not together!)

Debugging

There is some debugging in Pytest using the set_trace()method (https://qxf2.com/blog/debugging-in-python-using-pytest-set_trace/) but I haven’t found it particularly useful. However, if you use PyCharm you can set up a really useful run/debug Pytest configuration which let’s you run Pytest and step through the code line by line.

Making a test framework

Firstly, I moved all the faking database connections functions into a separate shared file. I then made a class decorator which would handle all connections for each test to avoid repeated code in the test functions. Since the decorator has to wrap the test function the assert statement became a problem, since if the assertion fails no other code gets executed and hence the connection wouldn’t be closed in the decorator. Fortunately, there is an extra additional package for Pytest called pytest_check which can be used as follows

import pytest_check as checkcheck.equal(a, b)

This is equivalent of asserting that a is equal to b but it allows the program to continue running. Thus you could even have multiple checks within each test function.

Updates

02/2021: Later I found that the birdisle package used to fake a Redis connection has a major drawback . If you are working on Mac OS it cannot be installed and the tests cannot be run. One painful way round is to have a Linux virtual machine and sync your files across. However, I later found that there is an alternative package called redislite which can be used just as easily. If you want to fake a connection you simply have to do

from redislite import StrictRedisrc = StrictRedis(decode_responses=True)

and all the commands of the redis package should be supported.

Summary

I hope you found this article useful and it will save you some time if you find yourself in the same position and need to use the same packages. If anyone has any questions or suggestions on what can be done better/easier please do leave a comment.

--

--