Introduction: ONNX is an open-source model standard that allows exchanging models between different frameworks. It enables to train models with any kind of framework as long as the frameworks are supporting ONNX. In my opinion, this is a game-changer because data scientist can use their favourite tool to train a model while the machine learning engineers only have to set up one production environment that can run the model. In this post, I covered how to serve a scikit-learn model with ONNX Runtime and FastAPI. We are training a simple linear regression on the Boston housing dataset. The goal of this article is to give you an introduction to ONNX Runtime and FastAPI.
In the first step, we are training a linear regression with scikit-learn and converting the model to ONNX.
In the second step, we are combing ONNX Runtime with FastAPI to serve the model in a docker container. ONNX Runtime is a high-performance inference engine for ONNX models. FastAPI is modern python framework to develop APIs in a very efficient way.
We are using the Boston Housing dataset to train a simple linear regression. The dataset is split into a training and testing dataset.
The conversion of the scikit-learn model to ONNX is done with the sklearn-onnx package.
ONNX Runtime is an open-source inference engine that was published by Microsoft. The purpose of the engine is to run high-performance inference.
I personally really like the idea behind the ONNX Runtime because it will save you some work. Before ONNX Runtime was published I needed to convert my ONNX models to TensorFlow to run the inference. On the way to productionize your model, this is a disadvantage because you need to convert your model from ONNX to another format. Through the ONNX Runtime, this step is obsolete because models can run directly in the engine.
You just need to install the onnxruntime package to use it.
Rest API with FastAPI
Introducing FastAPI is an excellence introduction article about FastAPI that was written by the main developer of the framework and I highly can recommend to read it.
FastAPI is python based web framework that is built on python type hints. Python type hints are a new feature that was introduced with python 3.6+. It allows you to declare types for a variable with a specific syntax. This makes your code more easy to understand. Through the type hints, FastAPI generates automatically OpenAPI documentation.
One of the main advantages of the framework is that you can define the body for a request directly with a python class. Through the Python type hints, the framework does automatic type checking when it receives a request. To use this feature a class needs to inherit from BaseModel.
Next, we can continue with the implementation of the /predict REST endpoint that can be requested by a POST request. Besides that, a token needs to be placed in the header to do a simple security check. To reduce the complexity of this post we are using a fixed token that can be set before the application is started. This security mechanism is called every time we receive a request through dependency injection.
Afterwards, we are putting everything into one docker container. You can find the code under the scikit-onnx-fastapi-example repository. The application can be executed by running docker-compose up. You will find the application under http://localhost.
ONNX Runtime is a straightforward tool to run ONNX models. Together with FastAPI machine learning models can be put into a production environment very fast. Another advantage that comes with the usage of FastAPI is that the API is documented. This enables companies to integrate machine learning applications directly into the existing environment.
Join the Machine Learning in Production LinkedIn group to learn how to put your models into production. Feel free to add me on LinkedIn. I am always open to discuss machine learning topics and give advice on your data science projects business implementations!
If you have any feedback or questions, please feel free to contact me on Nico Axtmann.