Deploying a scikit-learn model with ONNX and FastAPI

4 min readAug 7, 2019

Disclaimer: In this article, I used the Boston Housing dataset. I consider choosing this dataset as a massive mistake since some of its features should not be used in any kind of decision-making system. I am against the usage of features in models that are discriminating people. My intention of writing this article was to focus only the technology to supports models to bring in production and not to hurt anybody.

Introduction: ONNX is an open-source model standard that allows exchanging models between different frameworks. It enables to train models with any kind of framework as long as the frameworks are supporting ONNX. In my opinion, this is a game-changer because data scientist can use their favourite tool to train a model while the machine learning engineers only have to set up one production environment that can run the model. In this post, I covered how to serve a scikit-learn model with ONNX Runtime and FastAPI. We are training a simple linear regression on the Boston housing dataset. The goal of this article is to give you an introduction to ONNX Runtime and FastAPI.

Overview

In the first step, we are training a linear regression with scikit-learn and converting the model to ONNX.

In the second step, we are combing ONNX Runtime with FastAPI to serve the model in a docker container. ONNX Runtime is a high-performance inference engine for ONNX models. FastAPI is modern python framework to develop APIs in a very efficient way.

Model training

We are using the Boston Housing dataset to train a simple linear regression. The dataset is split into a training and testing dataset.

The conversion of the scikit-learn model to ONNX is done with the sklearn-onnx package.

ONNX Runtime

ONNX Runtime is an open-source inference engine that was published by Microsoft. The purpose of the engine is to run high-performance inference.

I personally really like the idea behind the ONNX Runtime because it will save you some work. Before ONNX Runtime was published I needed to convert my ONNX models to TensorFlow to run the inference. On the way to productionize your model, this is a disadvantage because you need to convert your model from ONNX to another format. Through the ONNX Runtime, this step is obsolete because models can run directly in the engine.

You just need to install the onnxruntime package to use it.

Rest API with FastAPI

Introducing FastAPI is an excellence introduction article about FastAPI that was written by the main developer of the framework and I highly can recommend to read it.

FastAPI is python based web framework that is built on python type hints. Python type hints are a new feature that was introduced with python 3.6+. It allows you to declare types for a variable with a specific syntax. This makes your code more easy to understand. Through the type hints, FastAPI generates automatically OpenAPI documentation.

One of the main advantages of the framework is that you can define the body for a request directly with a python class. Through the Python type hints, the framework does automatic type checking when it receives a request. To use this feature a class needs to inherit from BaseModel.

For our application purpose, we defined the two classes HousingFeatures and PredictionResult. The HousingFeatures class contains all the necessary features that are required to perform a prediction.

Next, we can continue with the implementation of the /predict REST endpoint that can be requested by a POST request. Besides that, a token needs to be placed in the header to do a simple security check. To reduce the complexity of this post we are using a fixed token that can be set before the application is started. This security mechanism is called every time we receive a request through dependency injection.

Afterwards, we are putting everything into one docker container. You can find the code under the scikit-onnx-fastapi-example repository. The application can be executed by running docker-compose up. You will find the application under http://localhost.

The script test.sh shows how to perform a prediction against the /predict endpoint. The OpenAPI documentation is available under http://localhost/docs.

Overview of a request in the application

Summary

ONNX Runtime is a straightforward tool to run ONNX models. Together with FastAPI machine learning models can be put into a production environment very fast. Another advantage that comes with the usage of FastAPI is that the API is documented. This enables companies to integrate machine learning applications directly into the existing environment.

Join the Machine Learning in Production LinkedIn group to learn how to put your models into production. Feel free to add me on LinkedIn. I am always open to discuss machine learning topics and give advice on your data science projects business implementations!

_________

If you have any feedback or questions, please feel free to contact me on Nico Axtmann.