How to mock S3 services in Python tests

Lorenzo Peppoloni
5 min readApr 4, 2018

--

I already wrote about the importance of tests.
Very often we write a bit of code which interacts with services (AWS, databases, …) and we want to test this interaction. What happens is that interacting with the actual service during the test may have undesired side-effects, so you want to mock the service. For example, we are writing the code for a new “post to a social network” button and we do not want to post something every time a test is run.

One case that often happens to me is the need to mock AWS services, in particular S3.
Let’s have a look at how you can do it in Python and which are some possible best practices.

Use case

We will pretend we want to write a function to downloads all the .json files from a specific folder in a S3 bucket. We are assuming we do not have to care about subdirs.
CAVEAT
It is to be noted that this specification is quite imprecise, because there are no folders on S3, but I think that the task is easier to understand this way.

To test our code we will need a fixture.
The best description of fixture is: a known-state of the world to run our tests. This will ensure our tests are repeatable.
In our case the fixture will be represented by the following folder structure, which will mimic the S3 keys in the test:


├── mock_folder/
│ ├── foo.json
│ ├── bar.json
| ├── not_json.notjson

We are going to test our bit of code, by being sure that foo.json and bar.json are downloaded from the bucket but not_json.notjson is not.

Required modules and knowledge

  • Unit testing: we will use the unittest module to write out tests
  • AWS interaction: we will use the boto3 module to interact with AWS in Python
  • Mock S3: we will use the moto module to mock S3 services.
    I will assume a basic knowledge of boto3 and unittest, although I will do my best to explain all the major features we will be using.

Installation of required modules

That’s the easy part:

$ pip install boto3 moto

Writing the code

Let’s fire our favourite editor and let’s write the code we want to test.

import json
import os
import boto3def download_json_files(bucket: str, prefix: str, local_dir: str) -> None:
bucket = boto3.resource("s3").Bucket(bucket)
objects = bucket.objects.filter(Prefix=prefix)
keys = [obj.key for obj in objects if obj.key.endswith(".json")]
local_paths = [os.path.join(local_dir, key) for key in keys]
for key, local_path in zip(keys, local_paths):
os.makedirs(os.path.dirname(local_path), exist_ok=True)
bucket.download_file(key, local_path)

The function is pretty easy to understand, we list the content of a specific folder in a specific bucket and we download all the keys ending with “.json”, keeping the same path structure locally.

Writing the test

We will write the tests using unittest.

Unittest is a Python framework to run unit tests, for a reference of what a unit test is and why you should write them you can have a look here.
The framework makes available to us the TestCase class, which can be used as a base class, for specific tests.

When we are mocking services like S3, we often want to set up the right test “world” using the fixture. The TestCase class comes to our rescue exposing two important functions:

  • setUp: which is called to prepare the test fixture. Specifically, it is called immediately before calling the test methods.
  • tearDown: which is called immediately after the test method has been called and the result recorded. This function will take care of restoring the world to its original state after the tests are run.

In our case, we are going to setup a mock S3 environment in the setUp function and we are going to clear it in the tearDown function.

An alternative (and legit) approach would be to setup and clear the fixture environment inside every test function, without using the setUp and tearDown functions. This will keep the test cases more atomic and independent, since there is no danger of finding the world in an unexpected state, due to a test case which was run earlier. I like setting up everything beforehand and using the same environment for every test case.
Let’s first write the TestCase class with the setUp and tearDown methods.

import os
import tempfile
import unittest
import boto3
import botocore
from moto import mock_s3
from download_json import download_json_filesMY_BUCKET = "my_bucket"
MY_PREFIX = "mock_folder""
@mock_s3
class TestDownloadJsonFiles(unittest.TestCase):
def setUp(self):
client = boto3.client(
"s3",
region_name="eu-west-1",
aws_access_key_id="fake_access_key",
aws_secret_access_key="fake_secret_key",
)
try:
s3 = boto3.resource(
"s3",
region_name="eu-west-1",
aws_access_key_id="fake_access_key",
aws_secret_access_key="fake_secret_key",
)
s3.meta.client.head_bucket(Bucket=MY_BUCKET)
except botocore.exceptions.ClientError:
pass
else:
err = "{bucket} should not exist.".format(bucket=MY_BUCKET)
raise EnvironmentError(err)
client.create_bucket(Bucket=MY_BUCKET)
current_dir = os.path.dirname(__file__)
fixtures_dir = os.path.join(current_dir, "fixtures")
_upload_fixtures(MY_BUCKET, fixtures_dir)
def tearDown(self):
s3 = boto3.resource(
"s3",
region_name="eu-west-1",
aws_access_key_id="fake_access_key",
aws_secret_access_key="fake_secret_key",
)
bucket = s3.Bucket(MY_BUCKET)
for key in bucket.objects.all():
key.delete()
bucket.delete()
def _upload_fixtures(bucket: str, fixtures_dir: str) -> None:
client = boto3.client("s3")
fixtures_paths = [
os.path.join(path, filename)
for path, _, files in os.walk(fixtures_dir)
for filename in files
]
for path in fixtures_paths:
key = os.path.relpath(path, fixtures_dir)
client.upload_file(Filename=path, Bucket=bucket, Key=key)

The code is pretty simple, we are using the decorator @mock_s3 to specify the we want to mock out all the calls to S3. In the setUp function we are also managing the fact that we may be doing something wrong and are actually interfacing with the real service and not the mocked one. Also we are making sure that our test world is kept in a consistent (expected) state. This is particularly important if we are running multiple tests one after the other, since moto keeps the state of the buckets and keys.

Lastly, let’s add to the TestDownloadJsonFiles class a module to test our function. Since we do not want to modify or manage the filesystem every time we run a test, we are creating a temporary directory in which we will download the files.

def test_download_json_files(self):
with tempfile.TemporaryDirectory() as tmpdir:
download_json_files(MY_BUCKET, MY_PREFIX, tmpdir)
mock_folder_local_path = os.path.join(tmpdir, MY_PREFIX)
self.assertTrue(os.path.isdir(mock_folder_local_path))
result = os.listdir(mock_folder_local_path)
desired_result = ["foo.json", "bar.json"]
self.assertCountEqual(result, desired_result)

To test the behaviour of our function, first we check (assertTrue(…)) that the folder structure was created correctly, and then (assertCountEqual(…)) we check that we downloaded the correct files.
The tempfile module will take care of clearing the temporary folder when all the code in the context has been executed.

To run the tests we can either add a main to the tests Python file

if __name__ == "__main__":
unittest.main()

and simply run the script

python my_test.py

Or we can use some more sophisticated module, such as nose, which extends unittest to make testing easier.

In whichever way, if you run the test you will see the following output:

.
----------------------------------------------------------------------
Ran 1 test in 0.212s
OK

Bonus: running the test with nose

The nose module makes running tests easier. By default, it runs all the tests in files or directories under a specified directory whose names include “test” or “Test” at the beginning or at the end (e.g., “test_xxx” or “TestClass” but not “sometest”).

Let’s install the module

pip install nose

After the installation, assuming we are in the tests folder, we can run

python -m nose .

If everything worked fine you will see the same output as before.

And that is pretty much it!

Enjoy testing and mocking S3 services, and above all, enjoy testing!

--

--

Lorenzo Peppoloni

Tech enthusiast, life-long learner, with a PhD in Robotics. I write about my day to day experience in Software and Data Engineering.