Validate json models with swagger and bravado

If you design and implement APIs for a living then you’re probably already familiar with swagger. Swagger is a specification for defining API endpoints and the model objects they transact. Once you have a swagger definition of your API written in either json or yaml you can do quite a few useful things with it: generate HTML documentation on the fly; generate client and server code; generate a postman collection for endpoint testing; and more to the point of this piece, use the spec to validate incoming objects at request time, resulting in meaningful error responses to clients.

To demonstrate the json validation capabilities we will create a very simple API for managing my fleet of dream cars, and then implement it using flask in python 2. Validation will be performed against our swagger spec using the bravado-core package from Yelp. The spec itself will be written in yaml, since it’s a little less verbose than json and will make for easier reading. To begin we can add some boilerplate and then define the objects that our API will make use of:

The boilerplate at the top is pretty self-explanatory. After it appears the “definitions” section, which will contain the object model against which we’ll be validating in the flask app. Let’s take a closer look at the first object, Registration:

In swagger model objects are defined by a schema object that follows a defined subset of the JSON schema core specification. The Registrations object is intended to hold the state registration info for an automobile. It is of type “object” and has two fields, state and plate_number, both of which are required. The fields are fully-defined in the “properties” field of the schema object. The “state” field is of type string and uses a regex pattern to enforce its format. The “plate_number” field is also of type string, and simply specifies minimum and maximum length. Note also the textual descriptions included. All of this information is included in generated API documentation, so it is useful for consumers of your API even if you don’t do schema validation.

The Car object comes next, and defines what one of my dream cars will look like in json:

I’ve left out most of the fields for brevity, since they are no different in form from the fields of the Registration object which we just discussed. The main difference is the inclusion of a reference to the Registration object that defines the Car.registration property. This is how object aggregation is done in json schema/swagger. Note also that “registration” is not included in the list of required properties. After all, not all of my dream cars are necessarily registered and roadworthy.

The remaining two objects in the “definitions” section are a list of cars, and an error object containing a status code and message to be returned when something goes wrong, such as validation failing. With our object model defined we can add the definitions of our endpoints:

Swagger endpoint definitions appear in the “paths” section of the spec document. There are three paths supported by the My Cars API: /swagger, /postman, and /cars. The first two are a standard practice for me when creating an API using my typical workflow. The /swagger endpoint returns the json representation of the current swagger specification, while the /postman endpoint returns a json representation of a postman endpoints collection. Including these endpoints has two big benefits: first the API specification can be loaded into the online swagger editor using the url of the /swagger endpoint; and second the postman collection can be loaded into postman the same way. This makes your API comprehensively self-documenting and easy to explore.

The /cars path is the one which will actually interact with the data. A GET to this endpoint will list all the cars in the database, while a POST will add a new car. Note that each endpoint defines the type of object, if any, that it will return along with different status codes, and the POST method definition includes a reference to a body parameter containing the new car data to be added. This is the mechanism by which models and endpoints are associated in the spec. And with that we have a complete swagger specification of the My Cars API. You can view the whole file on github. The next thing to do is create a simple flask implementation we can test validation in:

I’m not going to dive into this implementation in any great detail, since it’s just a framework for demonstrating validation. If you’re familiar with flask then it’s pretty straightforward. Although simple this file follows the main pattern I adhere to when developing APIs in any framework: the handlers receive requests, send responses, and deal with errors that bubble up; all data access and other logic is delegated to called functions. In this case the “data layer” is folded in for clarity, and consists of just the “_cars” list which will hold any POSTed data, and the “_add_car(car)” method which receives json data from the POST handler. This method validates the new car by calling “validator.validate()” so let’s take a look at that module now:

There are a lot of tools that work with swagger specifications, but one of the most useful is the bravado package from Yelp. Bravado makes it easy to implement client-side code to access swagger-defined APIs. That’s a subject for another time. The validation behavior we’re interested in is implemented in a separate package called bravado-core, and from that module imports the “spec” and “validate” types. At import time the module determines the local path to the spec document, uses pyYAML to read it into a dict, and then passes that dict to the Spec.from_dict() method to create an instance of the bravado-core.Spec class. It also grabs the definition of Car from the spec and saves it for later reference.

With this in place all that is necessary is to call bravado-core.validate.validate_object() passing the Spec instance we created, the Car definition, and the deserialized python dict that contains the data sent to us by the API user. If this method returns successfully then all is well, the object passed validation and we can go ahead and add it to the database. If the object does not pass muster then the jsonschema.exceptions.ValidationError exception is raised, containing the validation error message and the field that caused the problem. It is then trivial to return this information to the calling client, which gets a clear and detailed message about why their attempt to add a new car to their dream garage resulted in a 400 Bad Request.

Let’s see how it actually works. First we start up the server:

$ python
* Running on http://localhost:8080/ (Press CTRL+C to quit)
* Restarting with stat
* Debugger is active!
* Debugger pin code: 169–026–555

Now we can pop over to postman and add a 1965 Jaguar XKE to our dream collection, because why not?

Everything worked fine, we got a 200 back, so we can hit the /cars endpoint with a GET and see that we now own a beautiful Racing Green XKE:

Now lets go back and play the role of a client developer who hasn’t read the excellent API documentation that we spent tons of time on:

The client has sent the registration information but the plate_number is not included. In response they get back a 400 Bad Request with an error object that clearly states what the problem is. Let’s look at a value that is out of range rather than missing:

Why there are no 12 door BMW 650csi’s I don’t know, but at least the developer of this client now knows what to do to get the API call to succeed.

How many times have you made an API call that failed and left you to go digging through documentation and comparing your data to sample dumps to figure out what went wrong? Using the swagger specification and bravado-core you can easily create self-documenting, testable and explorable APIs that not only reveal themselves to client developers, but respond to bad data with clear and unambiguous messages that enable the problem to be fixed quickly and with a minimum of wasted effort. If you want to play around with this sample project you can get the whole thing on github. Thanks for reading!