AWS Lambda Event Validation — from Zero to Hero

Ran Isenberg
CyberArk Engineering
5 min readAug 9, 2020

This blog has an updated version at my new website, Ran The Builder.

https://www.ranthebuilder.cloud/

So, you’ve started your Serverless journey. It’s new and exciting and there’s lots to learn. You begin with your first AWS Lambda function. Everything looks fine and it just works, your Lambda gets an input event and produces output.

However, problems tend to arise when unhandled exceptions and failures are encountered. These prove to be rather expensive, when not dealt with properly, as they can cause unexpected bugs, security issues and costly Lambda retries.

In this blog, we’ll discuss how to parse event schemas correctly and how to handle event validation exceptions. I’ll focus on Python, but these guidelines & tips are applicable to any other programming language.

Problems? What problems?

Let’s observe the Lambda handler below.

The Lambda receives the event parameter, which is a Python dictionary.

If at this stage you access the dictionary without checking its validity, for the majority of Lambda invocations you will be fine. However, in some cases, the event dictionary might not have the ‘input’ key or a list, or the list won’t have at least 2 items (we check for index #1). An exception will be raised, and it won’t be caught.

The first problem is when an exception isn’t caught in Lambda, AWS triggers Lambda retries by default (5 times by default), which will fail again and again. Since you pay for execution times, this can really add up.

The second problem is in cases where an exception isn’t thrown, but the values are invalid. Your program could suffer from “minor” side effects, like invalid program integrity, undefined or invalid behavior bugs and even security issues.

The third problem is that events are updated or changed by services regularly (especially AWS services) and the event dictionary can contain values which your Lambda didn’t expect.

Your code will fail, and that’s ok, but it should fail in the “right” way.

What if I told you that you can solve all three problems by combining validation and input constraint checks with one simple library?

Introducing Pydantic

Why Pydantic?

1. Performance. According to Pydantic’s benchmarks, it performs at least 1.4x better than any other JSON schema validation libraries.

2. Once validated, the parsed object is used as a regular data class container.

3. Produces easy to read code, abstracts and hides data class implementation. All you need to do is inherit from BaseModel class.

4. Validation errors are comprehensive and let you know exactly what has failed and where.

Let’s see Pydantic in action.

Let’s assume you are writing a Lambda that receives a music album’s metadata and pushes it to a DynamoDB table. This is an event your new Lambda is expected to receive:

A music record has a title, genre, release date and an artist. An artist has a name and an age. Your Lambda will push this entry into a table, but first, it needs to validate the data. The Pydantic schema file will look like this:

Let’s go over the basics:

· Each Python dictionary is modeled as a class that extends BaseModel

· If a field is written in a schema, it’s mandatory and has to appear in the input (unless you define it as Optional)

· Pydantic supports many inherit types, such as all the Python.typing library, uuid, enum, IPv4/6, dates, secrets, http urls and more

In this case, most parameters are string or int. I also used datetime.date (for release_date), Optional (and a default value in case it’s missing) and typing’s Literal class (genre can only be one of three strings Metal/Rock/Lame)

· Notice the validator decorator? It’s a great tool for adding constraint checks on variable data. In this case, I want to validate that the age is larger than 18. If the input age is smaller than 18, a Pydantic ValidationError will be raised and this will be printed: “artist age is invalid (type=value_error)”. Pydantic’s ValidationError exception is very specific and you will know right away what went wrong

· There’s also a powerful utility called Root validator. It runs after all class parameters are verified and allows defining strong relationship checks between parameters. You can read more about it here

This is what your improved Lambda will look like:

You can access class members after the parsed_event is created (because it passed validation). All exceptions are handled in order to avoid automatic retries.

This example is the gold standard for input validation of every Lambda.

Validation? Not just for input

Schema validation can also be applied to boto3 API calls (they return a dictionary).

Some events can be very complex to map.

The beauty of Pydantic is that you don’t have to map ALL the schema you receive, but only what you actually *use* and care about.

Look at the following boto3 EventBridge call:

And the matching schema:

Let’s assume you just require the status code from the response. In that case, you can clean up the schema. Pydantic will ignore fields that are not mapped and will only raise a validation error regarding the defined fields.

The improved version will look like this:

Your code is now more robust to changes. It will only fail validation if the fields you care about are changed, and won’t be affected by changes in the other fields.

Let’s sum it up

Become a validation hero by following these steps:

· Use schema validation wherever possible: on API inputs and responses

· Add custom validators to validate value constraints

· Catch ValidationError exceptions and handle them. Don’t let exceptions go unhandled to avoid failing retries

From my experience, it is very practical to share schema files and definitions between services and teams as a way to publish Lambda APIs.

You can also reduce code duplication by writing validator’s function libraries, which other services and teams can import. For example, phone number validations, AWS region or ARN string validations that are very common.

About Me

Hi, I’m Ran Isenberg, an AWS Community Builder (Serverless focus), a Cloud System Architect and a public speaker based in Israel.

I see myself as a passionate Serverless advocate with a love for innovation, AWS and smart home applications.

Connect with me at https://www.linkedin.com/in/ranisenberg/

--

--