Many many many many projects include one or more RESTful API services, typically exchanging JSON data over HTTPS. This works great, and JSON has some really nice properties that makes it lovely to work with:
- Readable — each JSON message is just text, which can inspected and interpreted easily by humans.
- Dynamic — there are no strongly enforced types which makes it easy to add fields and change values on the fly.
- Well supported — basically every mainstream language has JSON support.
However, JSON’s dynamic, unstructured nature can also make it a bit of a pain to work with. You get no guarantees that some piece of JSON you accept will have the fields you expect, or that the values of those fields make sense in the context of your API. That is why should always validate our requests before using them in our application logic.
I’ve been grappling with this problem at Citymapper and in several of my personal projects. Over time I’ve found myself gravitating towards a specific pattern which has been working well for me, and now I want to share it with you 💕
Let’s say we have an application where users can create events.
We define a public facing API
POST /events which creates an event for the requesting user. Now, how should we implement request validation for this? One way is to add the validation inline, right in the request handler’s body.
Code examples are in Python, so it should be fairly clear what’s happening even if you’re not familiar with the syntax.
This is fine… we check everything we need to… but just look at how much of our handler is taken up by validation logic! This doesn’t sit well, we’re mixing two things that feel very different: validation and application logic.
A nicer pattern I’ve found is to create an API model to represent a request to our API. This model will be responsible for parsing incoming data and checking if that data passes our custom validation.
Let’s define an API model for the events endpoint.
Once we instantiate our
EventRequest class we have access to the request’s fields and validation state as needed.
We can now write our handler like this.
This is much better! We’ve encapsulated our validation logic and our handler reads much better. IMO handlers should read well at a high-level, invoking clearly named functions which handle the nitty-gritty stuff.
Of course this is just a simple example, we can definitely improve it. Some ideas:
- Define a decorator which implements common functions like
from_dictso we don’t have to repeatedly define them for each endpoint. This helps keeps us DRY. ☔️
- In Python, we can make use of the excellent attrs package to reduce the class boilerplate.
- We can generalise our class to serialise the response too. We’d just have to add the
public_idattribute and a
I’ve personally found this approach to be a logical and scalable way to tackle the tricky problem of clean request validation. After applying some of the improvements listed above, I often find myself writing API models as minimally as this:
Fifteen lines and validation for this endpoint is done, and cleanly separated from our application logic!
api_model decorator just abstracts away the common methods mentioned before, applies attrs and does a few other performance hacks like caching the validation result so repeated checks are done in constant time. You can check out my implementation of
api_model here: https://gist.github.com/benmoose/4fc2434b9a70fc8a8a08c59a8dc95c5b.
I hope this helps someone ✌️
P.S. Wish there was an alternative to JSON which actually enforced types? Check out Google’s Protocol Buffers — they’re pretty awesome, and come will a ton of other performance benefits!