Testing Machine Learning powered Data Pipelines

3 min readJul 14, 2020

In my previous post, I showed how to build a data pipeline that incorporates a machine learning model. However, I left out one very important aspect, namely testing. I would like to make up for this in the following. As I mentioned in the previous article, I have used depedency injection (DI) mainly to be able to test easily. DI allows me to inject mock classes into the data pipeline instead of the classes handling the connection to the databases and retrieving or writing data. So let’s jump into the actual code. You also clone the repo from my Github.

The incoming and outgoing data and the trigger is mocked to test the key component of the data pipeline, which is the Machine Learning model.

The unit test is supposed to test the API endpoint that serves as the data pipeline’s trigger. First, I set up the tests including the mocks for the input and output in a configuration file called conftest.py. In order to be able to test the endpoints, I also have to create a Flask test application. For this reason, I import the create_app method that is used to create the actual data pipeline. Since I want to mock the retriever and writer class, which handles the input and output of the pipeline, I import only the abstract classes RetrieverAbs and WriterAbs and not the actual classes that are used in the data pipeline. The abstract classes are required for the bindings of the Flask injector. For the same reason I import both the abstract model class ModelAbs and the actual implementation of the machine learning model. In my example, this is the DummyClassifier.

In the next step I will mock the input and output. The actual retriever class simply returns a DataFrame containing the data from the input database. Therefore, it MockInput class must only return some made up DataFrame. Make sure that the column names correspond to the names the machine learning model class expects, in this case it is data. The same holds for the output. It receives some result and moves it to a database. Consequently, I must make sure that the write method of MockOutput accepts a result. Since my actual writer class that writes the result to the database returns nothing, my mocked write methods can simply do nothing. If your actual write method returns something, for instance a success or failure message, you can easily mock that by returning the expected message.

Next, I set up the classes that are supposed to be injected via the binder. This is MockInput, DummyClassifier and MockOutput. It works exactly the same as in the actual data pipeline, but instead of injecting the classes that connect to the targeted databases, I bind the two mocks.

To complete the configuration, all we have to do is the set up the Flask test app and make everything available via Pytest’s fixtures. If you read my previous post, you might recall that the app factory create_app requires four arguments. Since I want to inject MockInput and MockOutput and change the bindings with my test configuration, I pass exactly those to create_app. The rest is some Pytest set up to provide a Flask test application to the later tests. In case you are not yet familiar with Pytest, this guide does a good job of explaining the most important of Pytest.

Finally after setting up the test environment, I can implement the test cases. For the sake of simplicity, I will only test the happy path of my application. But you can easily extend the set up and test other scenarios. You might be interested what happens if an empty DataFrame is passed to your model. Does the application break or does it return the error message you expect to see? For each test case, you can create other mocked inputs and outputs. These must be passed to another method like app running the create_app method.

Let’s get back to testing the happy path. I call the /model endpoint and pass a user id. The app generating the endpoint is flask_test_client, which is passed to the test as an argument. Due to the Pytest fixture, the flask_test_client method is available to the test. If the pipeline runs successfully, I expect “Model applied” to be returned. I also want to assert that the status code is 200.

I hope this guide helps you to successfully test your data pipeline and to maintain a high quality. Thanks for reading!

Testing Machine Learning powered Data Pipelines

Written by Kai Sahling