Implementing Schemathesis at PayLead

Jérémy
PayLead
Published in
10 min readMay 29, 2024

PayLead’s core business is our API, so we need tools to ensure its consistency and reliability.

Of course, we use unit testing or similar testing methods, but it won’t be the focus of this paper. Today, we will talk about Schemathesis — and more generally property-based testing — and how it enables you to have an API that functions exactly how it is described in your documentation.

What is property-based testing?

Schemathesis is a tool that will run property based testing on your API. Property-based testing is, in short, a way to implement tests that assumes the described behavior is always (and I do mean always, without any exception) correct. That is, whatever input you send into your tests, however weird, the behavior of the testing suite will be the same.

In the case of an API, Schemathesis will send all kinds of input to your API, in the called URL, query parameters or request body, and verify that the API answers fit the documentation you provided (more on property based testing here: https://techbeacon.com/app-dev-testing/how-make-your-code-bulletproof-property-testing).

To give you an idea of the scale, in our case, it tests around 100 operations per documented API path at each call. It has more than a 100 test cases on a well documented API, with a lot of varying query parameters or path parameters. I would be impressed if it doesn’t find any bug or documentation conformance issue in your API the first time you run it.

How to install Schemathesis

Firstly, it is very easy to download. A simple pip install schemathesis will be enough in Python to have it in your environment. They also have an application (that can be found at https://docs.schemathesis.io/, but some of its features are not free to use), so feel free to use what you feel more comfortable with.

Schemathesis customization

For this first setup, but you don’t need anything else than this download and a .json or a .yml documentation file following the OpenAPI specification (that you can find here https://swagger.io/specification/, and here is an example of a petstore swagger: https://petstore3.swagger.io/).

Then you can run the st run command with the path to the documentation file. You will probably need the options to be able to run it, like -H to specify headers and -b to specify the link to your api.

Full example:

st run -H "Authorization: Bearer <your_token>" -b <your_api_link> <your_documentation>

You can find the documentation about the command line interface here: https://schemathesis.readthedocs.io/en/stable/cli.html. I suggest you read it or at least the man, because Schemathesis really has a huge number of customization options.

Now, we will run Schemathesis on the PetStore API, which is publicly available on the swagger site.

Output of Schemathesis run

We can see there are quite a few errors, we will quickly look at some of them to see what Schemathesis can help us find!

Error 500 on Schemathesis output

Here, we can see Schemathesis found a 500 on /user/0 . It could be because no check is done on the format of the input, and 0 is not a valid UUID.

Undocumented return code on Schemathesis output

We deleted an order via a DELETE /store/order/0 , but the response code we received (200) isn’t mentioned in the documentation!

Response violates schema on Schemathesis output

We can see a “response violates schema” because it did not return a photoUrls field even though it is a required property!

We got lucky here because the IDs tested by Schemathesis are 0 and a bunch of random UUIDs, and since a pet had the ID 0, we could get information about it. Of course, in real life, we usually don’t use linear IDs; instead, we prefer UUIDs because of the layer of security they provide. But then, how would you know Schemathesis found valid answers to validate your schemas? Wouldn’t Schemathesis only get 4XX responses on randomly generated UUIDs?

It absolutely would! However, Schemathesis has a great feature that addresses this issue: hooks. As long as your response fits the documentation, its output will be green.

Schemathesis could receive only 404 responses and tell you everything is fine, but only because the 404 use case is documented. This seems positive initially, but it means that not all of your return schemas are tested. You need to see some 2XX responses to be sure Schemathesis actually tests all of your schemas. This is what the hooks are for (documentation right here: https://schemathesis.readthedocs.io/en/stable/extending.html#hooks).

If you want to have some customized hooks, you can write them in a Python file and export its path in SCHEMATHESIS_HOOKS. Here is our hooks file to ensure each call has at least one 2XX response:

from typing import List
import click
import schemathesis
from schemathesis import Case, GenericResponse
from schemathesis.cli.context import ExecutionContext
from schemathesis.cli.handlers import EventHandler
from schemathesis.hooks import HookContext
from schemathesis.runner import events

# Create a cache for storing endpoints that have been tested and the count of 2XX response
_cached_2xx_responses = dict()
@schemathesis.hooks.register
def after_call(context, case: Case, response: GenericResponse):
"""
For every endpoint tested, store an entry in the 2XX responses cache.
If the response is 2XX increment the count for this endpoint.
"""
endpoint = f"{case.endpoint.method.upper()} {case.endpoint.full_path}"
# For every endpoint tested, ensure there is an entry in the cache
if endpoint not in _cached_2xx_responses:
_cached_2xx_responses[endpoint] = 0
# If this response is 2XX
if 200 <= response.status_code < 300:
_cached_2xx_responses[endpoint] += 1


class CheckFor2XXResponseHandler(EventHandler):
def handle_event(self, context: ExecutionContext, event: events.ExecutionEvent) -> None:
"""
When all tests are complete, check through the 2XX response cache that emit a failure if
any have no matching 2XX responses.
"""
if not isinstance(event, events.Finished):
return
if event.has_failures:
return
schemathesis.cli.output.default.display_section_name("2XX RESPONSES")
click.echo()
click.secho("Endpoints tested:", bold=True)
for endpoint, count in _cached_2xx_responses.items():
verdict = "." if count else "F"
colour = "green" if count else "red"
click.echo(f" {endpoint} {click.style(verdict, fg=colour, bold=True)}")
failed_endpoints = [e for e, v in _cached_2xx_responses.items() if v == 0]
if len(failed_endpoints):
event.has_failures = True
event.failed_count += len(failed_endpoints)


@schemathesis.hooks.register
def after_init_cli_run_handlers(
context: HookContext,
handlers: List[EventHandler],
execution_context: ExecutionContext,
) -> None:
# Insert into the beginning of the handlers list
handlers.insert(0, CheckFor2XXResponseHandler())

Of course, you can do many things with it, and I invite you to explore all possibilities!

Last step: the links

So now that your Schemathesis works and ensures you have 2XX responses, what’s left? Are we all done? Well, not exactly. Sometimes, endpoints need input variables that are coming from the responses of other endpoints: this is what links are for (https://swagger.io/docs/specification/links/).

Links are meant to describe how different endpoints are, guess what? Linked together!

Let’s say you have an endpoint /users that lists your users, you could use links to say you want to use the content of the 200 response of this first endpoint to then call /users/{user_id} . Things are getting a little complicated (just now?) , but bear with me. To do that:

  • the /users/{user_id} path in your documentation needs an operationId,
  • the 200 response of the /users path needs a links field,
  • this links field needs the operationId from the /users/{user_id} path as a linkObject, and any other parameters you would like to pass.

It’s one of the hardest concepts to master in OpenAPI, but this pseudocode should make it easier to understand:

-/users/{user_id}:
-operationId: someRandomOperationId
-/users:
-200:
links:
-someRandomOperationId:
linkObject
-someOtherParameter

I suggest you look at the documentation on the swagger.io site, their description is very clear. If you plan to use this feature of schemathesis, you’ll need to use the --stateful option of the st command (https://schemathesis.readthedocs.io/en/stable/cli.html#cmdoption-schemathesis-run-stateful).

There are no links on the PetStore API (https://petstore3.swagger.io/). But we could easily imagine a workflow, such as:

  • Create a User on the POST /user endpoint
  • Try to log them in on the GET /user/login endpoint
  • Log them out with GET /user/logout endpoint
  • Delete the user with DELETE /user/{username} endpoint

What the output would look like on schemathesis:

POST /user .
-> GET /user/login .
-> GET /user/logout .
-> DELETE /user/{username} .

The → means linked operations. The only problem I found is that you cannot build request bodies with the responses from the last operation, so you would have to hard-code the resquestBody for the POST /user and the GET /user/login .
(NB: This is not true anymore. Following this article, Schemathesis updated this feature and it should be way more complete! I have not tried it yet, but it should solve this problem! Also, they announced some cool other features in development that I will let you check out!)

Although all of this is quite complicated, the documentation is well written, so you should be able to make it work with all the examples from swagger.io!

Additional steps

Now, your Schemathesis will work on your environment, but… what if we want automatic testing of our APIs, for example in a CI?

Well, that’s also possible! At PayLead, we use GitLab to run our test pipelines. Let’s look at some interesting features. For example, you could set up an automatic CI every night at midnight and use environment variables to test an environment you set up in advance (usually for testing purposes.

Because of the important amount of 4XX responses Schemathesis generates, you could get rate limited. This is where Schemathesis options get even more useful: you can use their rate limiter option not to get blocked!

We have implemented rate limiting in our development environment, so we use the --rate-limit parameter to slow down Schemathesis go slower and avoid getting into trouble!

For example, our gitlab-ci.yml looks something like this:

.schemathesis_template:
stage: test
rules:
# If it's a scheduled pipeline, we start the job automatically
# to get the results every day
- if: $CI_PIPELINE_SOURCE == "schedule"
when: always
allow_failure: false
# Else, it's a manual non mandatory step in our CI
- when: manual
allow_failure: true
before_script:
# this is where we log into our docker, because we run
# our CI in a docker.
script:
# this is where we start the script, because we have put the commands
# and exports for the command to work. You could just put your command
# in here if you do not need any complexity
- cd quality/schemathesis
- st run \
-H "Authorization: Bearer $API_TOKEN" \ # Our authorization bearer
-c all \ # all checks are run
-b $API_BASE_URL \ # checks on this API path
--cassette-path "/tmp/schemathesis_cassette.yml" \ #report file
--junit-xml "/tmp/schemathesis_junit.xml" \ #report file
--stateful links \ # useful for links
--rate-limit <your_rate_limit>\ # rate limiting
ref_schemathesis.yml # Your openAPI documentation being tested
after_script:
# we copy the output files to be able to export the artifacts with the
# gitlab CI
- cp /tmp/schemathesis_* .
artifacts:
# This is where we export the artifacts to be able to get them from the CI
reports:
junit: schemathesis_junit*.xml
when: always
paths:
- schemathesis_*
expire_in: 1 week

test-api:
# We export different variables because we have different documentation
# for different types of users. This is not mandatory if you have only
# one documentation.
extends:
- .schemathesis_template
variables:
API_TOKEN: "$API_TOKEN"
API_TYPE: "admin"

This should give you an idea of how to run Schemathesis in your CI.

How did it help?

Now everything is set up and you can start testing, but before you do we’ll give you an idea of how Schemathesis can benefit you!

We had a recurring issue on some queries that led to 500 errors. It turns out we had a special implementation on a query parameter, which led us to not check its format strongly enough. The API crashed because the SQL queries were not getting the UUID they were waiting for.

Schemathesis sent many inputs, including normal strings, integers, non-printable chars, etc., which showed us right away what we needed to fix. It also helped us change some schemas in our documentation to fit the returned data (e.g.,a resource was returning an array, but our documentation said it was returning a dict).

Of course, it found other problems, but it’s just a small preview of what problems it could solve in your application!

Any trade-offs?

Schemathesis might seem like the perfect tool to test your API conformance (and it is), but there are, of course, some trade-offs. Setting up perfect documentation can take a lot of time, even if libraries exist to help you with this.

Fixing the bugs is the most important part but, for us, simply having exhaustive documentation consistent with our API was not that easy, and setting up links can be quite tiresome. Using the tool to its fullest was not simple and took a lot of time, but we now have a surefire way to know our documentation is accurate, and our API behaves as expected!

On another topic, Schemathesis is very good at generating many use cases based on varying input parameters. However, this could easily slow down your test-suite, so you must adapt the coverage (in terms of parameter variations) to fit your performance requirements.

You’re ready to test!

Now that you’re all set, you can ask Schemathesis to poke at your API and find bugs quite efficiently. It will probably find many 500 errors, and errors in your documentation, but when it all turns green, you will finally be able to confidently say that your documentation describes precisely how your API works (and that your API functions exactly the way you want it to)!

May 28th 2024, Jérémy Pelletier, Backend Developer

Paylead: Fintech Seamlessly Embedding Loyalty into Financial Services.
We leverage bank transaction data to power a SaaS platform for banks and retailers across Europe that delivers seamless loyalty and engaging reward experiences when people bank, shop and pay.
Join us !

Disclaimer : The content of this article is for general informational purposes exclusively. All information is provided in good faith; however, PayLead makes no representation or warranty of any kind, express or implied, regarding the accuracy, adequacy, validity, reliability or completeness of such article. PayLead excludes any responsibility arising from this article, especially for the content and proper functioning of external links which are under the control of third parties.

Intellectual property rights held by PayLead protect all information in this article. Consequently, none of this information may be reproduced, modified, redistributed, translated, commercially exploited, or reused in any way whatsoever without the prior written consent of PayLead.

--

--