AWS Lambda — serverless — Python — DEVOPS

No Time for Tests? — 12 Recommendations on Unit-Testing AWS Lambdas in Python

This is the second of two articles about writing and testing DevOps-y Lambdas. It focuses on implementing readable and meaningful unit tests that don’t require complicated setups.

Jan Groth
The Startup

--

Much like software applications, infrastructure provisioning has moved away from monolithic solutions, shifting focus towards decoupled components and utilisation of public service APIs. Tasks that traditionally would have required a great deal of orchestration and heavy tooling have transformed into lightweight event-driven services. Frameworks like the AWS Serverless Application Model (AWS SAM) have come a long way and make it easy to implement complex applications in a “microservice-style”, often with little more than a few Lambdas as building blocks.

In the previous article, I’ve shared my thoughts on implementing readable, testable and maintainable AWS Lambdas. In this article, I’ll write about unit-testing the code.

From my own experience, I know that unit tests can be tricky for many DevOps projects — Which seems to be mostly due to the mechanics of writing good tests, as well as people’s creativity when it comes to why it’s just not the right time for it :)

This is not the place to argue against ‘there is no time for tests’ (for the record: I wholeheartedly disagree), so let me move on to demonstrate that writing meaningful tests for AWS Lambda is not difficult at all.

What to expect

I will be using the same exemplary problem statement than before (“Restricting inbound and outbound traffic for default security groups”), but will now focus at the testing side of things.

The complete working example can be found on GitHub:

github.com/jangroth/writing-aws-lambdas-in-python

All source code examples are taken from there. You are welcome to follow along by setting up the project yourself.

Writing tests that are easy to read and to maintain

Let’s start by looking at the setup and structure of tests. To me, the benchmark for a good test is how well it covers the source code and how easy it is to read.

I’m using pytest for the examples, mostly because I like the modern syntax and fixtures as a technique to help to keep tests readable. However, using unittest from the Python SDK is completely acceptable and only requires minor changes.

1. Make tests easy to run—and hard to ignore

Here is something that I think the Java ecosystem does a little better than Python: Projects usually follow the same directory layout, and by default building a project will already include running all tests, simply because build tools follow conventions and know where to find everything.

In Python, the tooling landscape is different, and there is no one single directory layout that most projects implement. Often, this means that setting up a project and figuring out how to run the tests is up to the quality of the documentation.

If you are not already set with your build system, I recommend implementing a makefile with an explicit test-target. While there are many different build systems out there, from my experience makefiles are widely understood and easy to install.

check: ## Run linters
...
test: check ## Run tests
PYTHONPATH=./src pytest
--cov=src --cov-report term-missing
@echo '*** all tests passing ***'
deploy: test [...] ## Deploy project
...
[→ source]

If you already have a build system in place, by all means, use that. Just make sure it’s running your tests both explicitly on a test target and implicitly on the build and deploy targets.

2. Mock out external dependencies

In the first article, I mentioned that using classes and moving the initialisation of external dependencies into the constructor would simplify the creation of test objects. Here is how it works:

In Python, the typical way of instantiating an object is by invoking its constructor:

obj = RevokeDefaultSg()class RevokeDefaultSg:
def __init__(self, region="ap-southeast-2"):
self.logger = logging.getLogger(...)
self.ec2_client = boto3.client(...)
self.ec2_resource = boto3.resource(...)
...
[→ source]

This code will initialize logger and boto helpers.

For tests, this is problematic, because we do not want to call out into AWS from unit tests. Instead, we can use an alternative way to instantiate objects: In Python, the __new__ -method creates an instance of a class without invoking the constructor (documentation). In tests, this allows to create a ‘skeleton’ instance that has its individual instance attributes replaced by mocks:

def obj():
obj = RevokeDefaultSg.__new__(RevokeDefaultSg)
obj.logger = MagicMock()
obj.ec2_client = MagicMock()
obj.ec2_resource = MagicMock()
return obj
[→ source]

A concrete test can then use this skeleton object (more on fixture in the next chapter) and set up the mocks according to the test case:

@fixture
def obj():
...
def test_should_tag_if_ingress_was_revoked(obj):
mock_sg = MagicMock()
mock_sg.ip_permissions = "ingress"
mock_sg.ip_permissions_egress = None
obj.ec2_resource.SecurityGroup.return_value = mock_sg
obj._revoke_and_tag(TEST_SG)
...
[→ source]

This is how both execution paths for _revoke_and_tag() look for original and test invocation side by side:

Looping back to the initial statement — the fundamental idea of this strategy is → identifying things that are hard to set up in unit-tests, bundling them in a single location, →bypassing invocation for testing and →injecting mocks where required.

A variant of this approach is patching objects or modules using @patch rather than bypassing code execution with __new__. While the results are similar, I prefer the constructor-based approach. That’s probably my Java development background as well as a lack of understanding of Python’s import name-spacing model, which usually leaves me without a clue how to debug patches if they don’t do what they should. For me, bypassing initialization just works.

3. Create test data in re-usable helper methods

Almost always tests need data to process.

For example, if we want to test process_event(), we must provide it with a sample Lambda event that it can run against:

def process_event(self, event):
sg_id = self._extract_sg_id(event)
if self._is_default_sg(sg_id):
self.logger.info(...)
self._revoke_and_tag(sg_id)
else:
self.logger.info(...)
return 'SUCCESS'
[→ source]

A sample event isn’t complicated and could easily be created in the test itself:

"id": "12345678-b00a-ede7-937b-b4da1faf5b81",
"detail": {
"eventVersion": "1.05",
"eventTime": "2020-02-05T12:34:56Z",
"eventSource": "ec2.amazonaws.com",
"eventName": "AuthorizeSecurityGroupIngress",
"eventID": "12345678-6466-4720-955e-e342e782d405",
"eventType": "AwsApiCall",
"requestParameters": {
"groupId": "sg-123"
}
}

However, events aren’t exactly small and have a lot more data than we are interested in, so creating them on a per-test base would add noise and repetition. A better strategy is implementing dedicated helper methods for test data. For more complex scenarios different helpers can be combined or can build on each other.

Conveniently, pytest has a handy feature called ‘fixture’, which allows to annotate a method and have its output injected into test cases. This example uses two fixtures — one for the test event and one for the skeleton object (see issue above):

@fixture
def good_event():
return {
"id": "3469fd4b-b00a-ede7-937b-b4da1faf5b81",
"detail": {
...
}
}
}
@fixture
def obj():
obj = RevokeDefaultSg.__new__(RevokeDefaultSg)
...
def test_that_uses_good_event(obj, good_event):
...
obj.process_event(good_event)
...
[→ source]

Whether you use fixtures or prefer to call methods explicitly, having dedicated helpers for test data creation is a clean technique to keep noise out of the test itself and to allow re-use across multiple tests.

4. Test one thing at a time and mock out complexity

It can be tempting to squeeze multiple aspects into one test case and to fire off a whole series of assertions at the end — But usually, that’s not a good idea: Complex tests can be a nightmare to understand and to maintain.

When it comes to testing, readability counts at least as much as for production code; or maybe even more so, as tests are often written in a ‘fire and forget’ style. Focusing on a single condition per test case makes it much easier to understand the test and to grasp the problem in case of failure.

For example, let’s say we want to test the path where a security group is not the default security group and nothing should be revoked. That’s everything printed in bold:

def process_event(self, event):
sg_id = self._extract_sg_id(event)
if self._is_default_sg(sg_id):

self.logger.info(...)
self._revoke_and_tag(sg_id)
else:
self.logger.info(...)
return 'SUCCESS'
[→ source]

To create a test case, let’s look at what is simple and straightforward to accomplish:

  • _extract_sg_id() only extracts data from a dict — this can easily be called, provided that the incoming event contains the right data (taken care of by a fixture).
  • _is_default_sg() needs to call out to the EC2-API — not so easy to test, and even more importantly, not relevant for this test case. Replacing the method with a mock allows focusing on what we want to test. All we have to do is set up the mock to return False when called.
  • the else branch is tricky, as we are testing for ‘nothing to happen’. However, we can invert the logic and make sure that the if branch was not called. For that, we replace _revoke_and_tag() with a mock and assert that it was not invoked. This is easy with MagicMock.assert_not_called()(link).

Putting everything together the test looks like this:

def test_should_process_event_and_do_nothing_if_non_default_sg(...):
obj._is_default_sg = MagicMock(return_value=False)
obj._revoke_and_tag = MagicMock()

obj.process_event(good_event)

obj._is_default_sg.assert_called_once_with(TEST_SG)
obj._revoke_and_tag.assert_not_called()
[→ source]

That’s five simple lines of code for a complete test case!

By the way, testing the alternative path through the method (‘is default security group and should be revoked’) is almost easier, as we can assert directly on the _revoke_and_tag()mock and don’t have to invert logic:

def test_should_process_event_and_revoke_if_default_sg(obj, good_event):
obj._is_default_sg = MagicMock(return_value=True)
obj._revoke_and_tag = MagicMock()

obj.process_event(good_event)

obj._is_default_sg.assert_called_once_with(TEST_SG)
obj._revoke_and_tag.assert_called_once_with(TEST_SG)
[→ source]

5. Use ridiculously obvious variable and parameter names

Unlike production code, in tests it doesn’t really matter how many variables you use or how you name them. The only goal is to make tests as readable as possible.

For example, we could have called the fixtures something like event1, event2 and event3. But how much more obvious are names like good_event, bad_event and unknown_event? This test case doesn’t leave the shadow of a doubt of what’s going on:

def test_should_raise_exception_if_wrong_event(obj, bad_event):
with pytest.raises(UnknownEventException):
obj._extract_sg_id(bad_event)
[→ source]

Similarly, it can be a good idea to extract constants only for the purpose of giving them a name that’s self-explanatory:

DEFAULT_GROUP = {"GroupName": "default"}
NOT_A_DEFAULT_GROUP = {"GroupName": "not default"}
def test_should_find_default_sg(obj):
obj.ec2_client.describe_security_groups.return_value = {"SecurityGroups": [DEFAULT_GROUP]}

assert obj._is_default_sg(TEST_SG)


def test_should_find_non_default_sg(obj):
obj.ec2_client.describe_security_groups.return_value = {"SecurityGroups": [NOT_A_DEFAULT_GROUP]}

assert not obj._is_default_sg(TEST_SG)
[→ source]

Okay, one would have probably guessed the meaning of the dictionaries without the explicitly named constants. But with the constants, it’s just impossible to miss the purpose. Even more so if data structures are more complex than in this example.

Saving yourself time and effort

Finally, let’s look at a few ways that can save you time when writing tests. This is a collection of tips that I find useful, from using the best possible tooling to focusing on the right things to test:

6. Did you know about debuggers?

This is most likely a no-brainer if you have used other modern programming languages before. If not, my advice is to set yourself up with an environment that provides sophisticated support for Python.

A text editor will get you through hello_world(), but will leave you on your own when it comes to language-specific support like automated refactorings and — even more importantly — testing and debugging of complex code.

Combining good tests with a debugger gives you a powerful tool, where you can step through your code in an almost real-world scenario, completely in control of all aspects of the execution. Gone are the days when ‘debugging’ described the activity of gradually adding more and more print statements to the code.

PyCharm and Visual Studio Code are both excellent IDEs that are available for free. If you can’t decide between the two, simply go with what the majority of your peers are using.

7. Make sure you saw every test case failing

It might sound trivial, but it isn’t: People are making mistakes when writing code, and they also make mistakes when writing tests.

From my own experience, I can tell how frustrating it is to sink hours into debugging a problem, only to eventually find out that the base assumption ‘the tests are passing’ doesn’t mean that the code is doing what it’s supposed to do. Always succeeding tests are the worst kind of friends. And they are pretty easy to produce — a tiny mistake in the assertion logic could be all it takes.

A good safeguard against this is to make sure that tests are really failing when some conditions aren’t met. This can either be achieved by writing tests beforehand (also see the last tip about this), or by commenting the code that implements the feature under test.

8. Order test cases by relevance

This is tiny, but valuable when looking at a screen full of tests:

It’s easier for future readers — including yourself — to get the high-level picture before getting to the details, and to read about ‘happy’ cases before diving into error scenarios.

You don’t have to be religious about it, but it’s a real help to structure the test suite in a loose top-down style from ‘very important for how the Lambda works’ to ‘edge cases it also covers’:

test_should_process_event_and_revoke_if_default_sg(...)
test_should_process_event_and_do_nothing_if_non_default_sg(...)
...test_should_find_default_sg(...)
test_should_not_tag_if_nothing_was_revoked(...)
...test_should_raise_exception_if_unknown_event(...)
test_should_raise_exception_if_bad_event(...)

9. Hate documenting your code? — Write a test instead

This is so much better than documenting code in comments or — even worse — Confluence pages: Tests are the perfect tool to describe the behaviour of your code unambiguously. And it’s pretty hard to ignore a failing test, so unlike their traditional counterparts, they don’t quietly run out of date.

Using tests for documentation requires two things:

  • An explicit test case for the aspects that matter
  • A name that describes the intention as clearly as possible

I find the loose pattern test_should_[something that's expected]_if_[something that happened] working well for me:

test_should_process_event_and_revoke_if_default_sg
test_should_process_event_and_do_nothing_if_non_default_sg
test_should_raise_exception_if_wrong_event

In case of test failure, it’s already obvious what went wrong only by the name of the failing test. Want to understand what a certain Lambda does? Just read the test cases (ideally, they are ordered by relevance as per the previous tip).

10. Seriously, don’t test boto

(Or any other frameworks — I’m just using boto as an example here.)

What I mean by this: When making requests to boto, the response will usually contain a JSON object that requires filtering to get the relevant data out:

{
"SecurityGroups": [
{
"Description": "default VPC security group",
"GroupName": "default",
"IpPermissions": [],
"OwnerId": "123456789012",
"GroupId": "sg-12345678",
"IpPermissionsEgress": [],
"Tags": [],
"VpcId": "vpc-12345678"
}
],
"ResponseMetadata": {
"RequestId": "12345678-3233-4eff-969f-7c6b43ff8f60",
"HTTPStatusCode": 200,
"HTTPHeaders": {
"x-amzn-requestid": "12345678-3133-4eff-969f-7c6b43ff8f60",
"content-type": "text/xml;charset=UTF-8",
"content-length": "857",
"date": "Sat, 02 May 2020 12:34:56 GMT",
"server": "AmazonEC2"
},
"RetryAttempts": 0
}
}

Writing tests that cover parsing of dictionaries can be tempting — but chances are this is adding limited value. They mostly create noise and make it harder to focus on the important bits. Also, if boto ever changes the format of their response there’s nothing we can do about it in the first place — we’ll most likely end up adopting our code to their changes. So there’s no point in covering boto’s requests or responses in our tests!

I find a good pattern is isolating boto call and filtering operation to a single method:

def _is_default_sg(self, sg_id):
sec_groups = self.ec2_client.describe_security_groups(GroupIds=[sg_id])['SecurityGroups']
return sec_groups[0]['GroupName'] == 'default'
[→ source]

For unit tests, all that’s required is setting up the expectation for the complete method:

obj._is_default_sg = MagicMock(return_value=True)[→ source]

That’s a single line of test code to set up even complex invocations — whilst being completely decoupled from the actual interaction with the framework.

11. Don’t look too hard at the test coverage

With pytest and pytest-cov, it’s very easy to measure test coverage — simply add some arguments to the pytestinvocation and produce a coverage report:

pytest --cov=src --cov-branch --cov-report term-missing[...]Name                              Stmts   Miss Branch BrPart  Cover   
--------------------------------------------------------------------
src/revokedefaultsg/app.py 59 11 12 1 83% --------------------------------------------------------------------TOTAL 59 11 12 1 83%

While it does make sense to look at test coverage to identify untested code, don’t get too hung up the metric itself.

First of all, aiming for 100% — or near 100% — test coverage is not realistic; and closing the last 20% gap will cost a lot of extra effort. Second, it’s far more important to focus on the important bits and readability than on keeping test coverage above a certain threshold.

I remember writing tests for Java getters and setters only to keep the test coverage high and the numbers looking good on paper. What a waste of time and energy in hindsight.

Similarly, great test coverage doesn’t necessarily imply great tests. I’ve seen 90% coverage achieved by a single test case, comparing 1,000 lines of input with 10,000 lines of output. While the result looked impressive on paper, the failing test created a single assertion failure that’s 20,000 lines long. Impossible to understand without installing a whole range of diff tools on your computer.

My recommendation: Write tests around features, not around coverage.

12. Consider giving test-driven development a try

As a last tip, I want to ask you to look at when you are writing tests. Because implementing unit tests doesn’t have to be the last step in the process.

When thinking about a new feature, it’s perfectly feasible to write a new test case first, see it failing and then implement the code to make the test pass. When fixing a bug, why not starting with a test that surfaces the problem before fixing the code itself?

I can almost guarantee that your developer experience will be more rewarding and that oftentimes the quality of the outcome will be higher.

This is because tests allow you to change perspective: They put you in the shoes of the consumer and make you look at your code from a feature perspective.

Also, they give you the peace of mind that your code does what it’s supposed to do. This makes it much easier to continuously refactor to cater for new features — as opposed to ‘surgically inject’ more and more functionality into a code base that has become too complex to mess with.

And last but not least it’s pretty hard to make the same mistake twice if you already have a test case that proves the code is working.

Resume

Many of the tips and recommendation only scratch the surface of much deeper stories that go well beyond the scope of writing a single Lambda. If something has spawned your interest, I would like to encourage you to keep on reading about the topic. If I had to recommend a single book on writing better code, it would be “The Pragmatic Programmer”, by Andrew Hunt and David Thomas.

In the article, I made the call to assume Python knowledge as a baseline and am not explaining any language features I’m using. This would have taken the focus away from the main topic and would have made it harder to follow along. However, I tried my best to make the code as readable as possible.

Also, a bit of a disclaimer about the example code I’m using: It’s a challenge to strike the balance between something simple enough to fit into the format of this article, but complex enough to illustrate problems well. If some recommendations seem like overkill for a Lambda with less than 100 lines of code, the picture looks completely different for bigger Lambdas in a more complex context.

And last but not least I want you to take away that one message behind all this: Writing good code and good tests is a lot easier than people might think.

Please leave a comment if you have a question or want to share your thoughts!

Happy Coding :)

--

--

Jan Groth
The Startup

DevOps Engineer at Versent in Sydney. Loves writing code.