Handling complex JSON Schemas in Python

In a previous post we looked at how to test your Python API app with JSON Schema. In case you have to deal with complex and nested JSON data, schema definitions can get long and confusing. However, there are ways to clean up and reuse your schemas by using references. Let’s see what they are and how you can leverage them in your testing setup.

Internal References

Let’s take an example endpoint GET /recordings/:id which returns a JSON response for a recording of a classical music piece:

We have a work object with a composer and there is a list of recording artists, which include the musicians and technical personnel. All three of them–the composer, the pianist, and the engineer/producer–share the same data structure of an artist.

Using the above JSON response, the corresponding JSON Schema would look like this:

We repeated the schema for artists twice. There is a cleaner way of defining this structure, though. JSON Schema allows you to extract parts of your schema that you want to reuse and put them under a "definitions" property. Then you can put a reference path to your definitions instead of repeating the full schema over and over again. Using this knowledge about internal references, the cleaned up schema looks like this:

We just moved our artist part into a "definitions" object and linked to it using the "$ref" key and the relative path to our structure. The#/ part points to the root of the schema we’re in.

So far, so good. It’s a bit less messy already.

File References

What if we had another endpoint for artist details, e.g. GET /artists/:id ? In order to test this endpoint we would define the same artist schema we already defined in recording.json, this time in another schema file artist.json :

So, now we have two places where we define the same schema. This is not very DRY and would need us to change both files if the JSON response for an artist changed. Luckily, JSON Schema lets us reference schemas in other files by using file:/ instead of #/definitions/ in the reference path.

Since we learned how to reference a file, we could just use the file reference in our recording.json :

That’s not too bad! Our schema definition is shrinking further.

However, providing the absolute path over and over again is not how we would like to reference our schema files. Let’s try with a relative path: "file:/artist.json" …

Bummer. It won’t work out of the box using the jsonschema package. The schema validator does not know where the referenced files are located.

We need to tell our schema loader what the base path for our schemas is. There is an issue in the jsonschema Github project that discusses exactly this problem. We could either setup a custom RefResolver or we can use the jsonref package, which one of the collaborators wrote. We’ll pick the latter.

The only steps to make it work are:

  1. Adding the latest version of the jsonref package to our requirements.txt file and installing it
  2. Loading our schema file with jsonref.loads()

The second step only needs a small modification in our custom assertion helper that we used in the previous post:

Instead of loading our schema files with json we use jsonref and pass it a base path. What jsonref does is replacing our schema’s relative file references with absolute references to files in our tests/support/schema directory.

With these fixes in place we can even skip the file:/ prefix in our schemas and it would still compile our schemas to include the schema definitions from the respective files. So, here’s the final cleaned up JSON schema definition:

That’s it for this post. With a couple lines of code we prepared our testing environment to handle complex JSON schemas and allowed to build clean, concise, and reusable schemas.

If you haven’t used JSON Schema for testing API endpoints yet, you should give it a try. Also have a look at the previous post for setting up the basic testing environment for your Python app.

Happy coding, happy testing!