Unit Testing Your AWS Lambda Functions

Published in

Slalom Technology

7 min readNov 18, 2022

Testing is an important part of any development cycle and writing good unit tests makes it a lot easier to catch issues before they even occur. However, testing data pipelines might not always seem straightforward and if you don’t have the right tools it can be challenging to write unit tests for your code. This is especially true for cloud data pipelines, where a lot of cloud artifacts that you don’t necessarily want to test are intertwined with the code.

In this post I will outline some of the best practices of writing unit tests for AWS Lambda functions and will give you concrete examples that you can hopefully re-use in your projects.

Use correct encapsulation

The first step to correctly testing your lambda is making sure that it is well encapsulated, that is to say, it uses Object-Oriented Programming principles to compartmentalize functionalities. This will make it a lot easier to test some specific aspect of your Lambda whilst making abstraction of the rest. A unit test should only test that a particular functionality of your system works in isolation.

For example, you typically don’t want to make I/O calls to AWS S3 during your testing of a function that does data transformation. You only want to test that you are performing the correct transformation in between reading from a bucket or writing the output to a bucket.

Concrete use case

To illustrate the concepts talked about in this post, we need a non-trivial use case. I have written an article that involves two Lambda functions triggered to normalize data and using the Python 3.9 runtime.

Using Software Engineering principles to implement a multi-source ingestion pipeline in AWS

Learn how to leverage software engineering principles to facilitate the development and implementation of your data…

awstip.com

Let’s take a simplified version of the Women’s World Cup Normalizer, whose goal is to transform raw data from the 2019 Women’s World Cup to match a defined normalized schema. For more information on the implementation of the Lambda, check out the aforementioned article. Here we just removed the abstraction of the Template Method pattern so that our Normalizer class isn’t the child of an Abstract Class but rather directly contains all the components it needs to be a standalone class.

You can find the full code-base of this simplified implementation here.

As you can see, we followed the Single-Responsibility Principle for the methods of our Lambda which makes the code clear and maintainable, but importantly, will also allow us to test different parts in isolation (which is what unit testing is all about).

Pytest

We will use Pytest as our testing framework. This isn’t a built-in Python library so you’re going to have to install it if you haven’t already. It is recommended you use venv to create a virtual testing environment.

mkdir .venv
python3 -m venv .venv/test_env
source .venv/test_env/bin/activate
pip install pytest pandas pyyaml

Project file structure

Importing your module in your test suite isn’t very straightforward so your file structure matters.

Here is the structure we are using for this project:

|- aws_lambda_testing
  |- lambdas/
    |- normalizer/
      |- __init__.py
      |- normalizer.py
      |- config/
        |- normalizer_config.yaml
  |- tests/
    |- __init__.py
    |- conftest.py
    |- lambdas/
      |- __init__.py
      |- normalizer/
        |- __init__.py
        |- test_normalizer.py
        |- test_data/
          |- sample_event.json
          |- test_matches.json

To run our test suite, open a terminal at the aws_lambda_testing/ level and run the command:

pytest tests/

Notice that we have __init__.py files at every level of our tests/ folder. These are important because pytest will automatically collect test_normalizer.py and then search upwards until it can find the last folder which still contains an __init__.py file in order to find the package root (in this case aws_lambda_testing/). To load the module, it will insert aws_lambda_testing/ to the front of sys.path (if not there already) and then we can load normalizer.py in test_normalizer.py as the module lambdas.normalizer.normalizer. Without the __init__.py files, pytest will throw an Unable to import module error.

Test flag

In the AWS environment, the Lambda service makes a lambda_handler() call which should begin the desired execution. In our case, we instantiate our Normalizer class object in a lambda_handler variable at the global level (line 140) of the normalizer.py module, so the main execution resides in the __call__() function. This object instantiation involves initializing a boto3 client object and looking up environment variables which we do not want to do when running our unit tests, so we make the instantiation conditional on the environment variable TEST_FLAG. We need to make sure that this variable is set to True during testing and this is achieved by adding the following lines in the aws_lambda_testing/tests/__init__.py file:

import os# Set TEST_FLAG env variables so lambdas know whether to call lambda_handler()
os.environ["TEST_FLAG"] = "True"

Every time the command pytest tests/ is executed, it will first execute the code in this __init__ file and therefore lines 140 to 149 in our normalizer.py module won’t be executed.

So what exactly should we test?

Going back to our use case, the question that now arises is: what parts should we unit test and what specific behavior are we testing?

The main logic of our function is found in the __call__() method of the Normalizer class as AWS Lambda calls lambda_handler() under-the-hood. Therefore, to test the general flow of our Lambda we need to test the latter method and verify that nothing fails. However, the function contains several calls to the boto3 S3 client which are not calls that we actually want to execute. Unit test should be environment agnostic (except for the required libraries) and deterministic and thus need to have the same result regardless of the AWS account and permissions used. This is where Mock objects come into play.

One of the reasons why we put s3_client in our Normalizer Class constructor is so that we can have full control of it during testing. That means that we can pass a Mock object as the s3_client parameter during the instantiation, which means that all the S3 calls we make will be mocked.

Happy path testing

Let’s start by testing the expected happy path outcome of our Lambda. We ask ourselves the question “What is the expected behavior of my function when it successfully does what it’s supposed to do?”.

In our case, we expect the Lambda to fetch a file from S3 (the file to normalize), normalize the file, save the normalized data to S3, copy the original file to a raw S3 bucket, and finally delete the S3 file.

As we already discussed, we don’t actually want to make the S3 calls but we can still make sure that our function made those calls even if we didn’t execute them. Mock objects have special assertion functions for exactly this purpose:

# Assert: verify execution was completed successfully
mock_client.get_object.assert_called_once()
mock_client.put_object.assert_called_once()
mock_client.copy_object.assert_called_once()
mock_client.delete_object.assert_called_once()

Therefore, all we need to do is pass a Mock() object as parameter when instantiating our Normalizer class.

Testing that the normalization process works properly can be its own separate unit test, so we don’t have to include that verification in our general happy path test. We can simply create a unit test that asserts that the DataFrame returned after applying the normalize_data() method is the same as the expected resulting DataFrame (that we can manually write in a csv file). Pandas testing module has method for this:

pd.testing.assert_frame_equal(actual, expected)

We can also create a separate test for our add_metadata() method.

To summarize, we create three unit tests, one that asserts that the normalization process does what it’s supposed to do, one that asserts the correct metadata is added by add_metadata() and one that asserts that the execution process of the lambda_handler follows the correct steps. The resulting test file look like this:

Notes:

We leverage fixtures for many parts of the tests. Pytest has a great documentation page on fixtures if you’re interested. We use both the built-in fixtures provided Pytest, as well as our own custom fixtures using the decorator @pytest.fixture(scope=”session”) to indicate to pytest that it is a fixture and also that the scope is “session” so that we keep it in memory for the whole test session and we don’t tear them down after each unit test.
We also create a global fixture for a LambdaContext object in the contest.py file under the tests/ folder since that would be an object that would be needed to test any other lambdas that we might add later. More information on conftest files can be found here.

Error testing

The tests created above verify that our functions are executed properly in an ideal scenario. Nonetheless, you often want to handle specific errors a particular way. For instance, if the normalization failed because a field in the JSON file doesn’t exist, we could want to copy the file in an error bucket and log an error instead of copying it to the raw bucket.

You could technically write infinitely many unit tests but you need to remember that changes in the base code will impact unit tests as well hence adding to the time spent developing. My personal rule is to start by writing one unit test for happy path behavior testing, and one unit test per error handling mechanism. In our case the function isn’t split into different if/else sections and therefore the happy path testing is quite straightforward. However, if your Lambda starts growing a lot and has many if/else statements, then it might become overkill to have complete coverage of the happy path. Unfortunately there is no general rule you can apply all the time but the main idea is to have a good balance between testing your main functionalities and not spending an unreasonable amount of time on developing test cases just to reach 100% test coverage.

When testing for errors, PyTest makes it easy to assert that a specific error has been raised using the with statement:

with pytest.raises(NormalizationError):
    # Act: call the normalize_data function
    lambda_handler.normalize_data(json_matches)

Final words

Unit tests are an essential part of any software engineering project, and should never be overlooked for data engineering either. Including a test suite in your CI/CD pipeline will catch a lot of errors before they occur and make development faster and cleaner in the long run.

Slalom is a global consulting firm that helps people and organizations dream bigger, move faster, and build better tomorrows for all. Learn more and reach out today.