Serve Models Fast with Flask

Published in

DataLab Log

5 min readAug 9, 2019

In many companies, Data Analysts spend a lot of time moving models around. For this purpose, they have to make sure dependencies are all compatible, and their environment does not mix libraries’ versions. Sometimes, people even move models’ weights around in excel spreadsheets to use inside other software, opening a huge gap between the training process and the final deployed model.

One of the reasons these practices exist is that there is no widespread protocol on how to serve models practically and without too much overhead. We wanted to use our models directly from Sklearn (or similar), in some way any app could make use of, and the natural choice was a REST API. Just so that we are in the same ground, a REST API is something that responds promptly to HTTP requests, a language-independent protocol.

There are many reasons why one might want to use a REST API, to handle the model as a service. The single most important reason for us is that it completely conceals the mechanisms underlying the model, abstracting it away in exchange for a straightforward interface.

The interface is usually guided by a contract, that defines what kinds of request are possible, what kind of data can be sent, and how the responses are structured. As long as the contract is attended (e.g. a Post request is made sending a list of arrays) the request can be made in any language, be it Python, Java, Ruby, Bash, etc.

When building products, teams usually have to maneuver a lot of distinct programming languages. For this end, encapsulating models in REST APIs enables seamless integration.

At the DataLab we made some attempts to use Seldon to deploy models. Seldon makes possible to deploy any model from a pickle file into a REST API (and inside Kubernetes!), and it worked in some cases making the process of deploying a lot simpler. Nevertheless, we really missed flexibility and found that the documentation was falling short of our necessities. The last straw was that sending data was taking too long, and although HTTP has support for compressed requests, Seldon has no way (at least none documented) of dealing with compressed payloads.

In sum, using Seldon gave us some guidelines on how to deploy models in a simpler way, but it was not enough for our needs. We decided to build our own prediction API from scratch using Flask, a simple web-development library for Python. In this post, I will present a very minimal solution for model deployment in a rest API.

Rest APIs with flask

In addition to the base flask, we use flask-restful, “an extension for Flask that adds support for quickly building REST APIs”.

The framework is very straightforward, defined in the skeleton below. After importing the necessary libraries, we instantiate the application and make it run. The application we just created listens on port 5000 but is a dummy — we haven’t added any functionality so it does not do anything.

Suppose we wanted some way to test if the API is working fine. Then, we could add a simple Resource. Resources are written as python classes and can implement GET, POST and PUT methods. The resources must be located in specific paths after the host.

In the example above, we can request our app using curl --request GET http://0.0.0.0:5000/my_custom_test. It will answer with “test successful!!!!!”.

Making models in a flexible manner

We follow a convention created by Seldon to make easier to put new models into the app. The convention is to leave the model class in a distinct file, that can be overwritten later on. You can save the model inside a pickle file, for example, and load it like the code below.

Going back to our Flask API, we have to add a new resource to receive the features and reply the predictions. We could increment our previous resource and just add a new method like POST or PUT, but it would be bad practice since the name would not be significative. We instead choose to create a new resource on another path.

The complete API

Let’s wrap the restful in a single file so that it is clear how straightforward the process is:

In principle, we can add numerous Resources to make multiple things in our API. Moreover, there are many improvements to be made before going to production with this API, like protecting the service from attacks and dealing gracefully with errors. Nevertheless, a working prediction API is at your disposal for prototyping.

Maybe in the future we open-source our library, but in the meantime you can try out this minimal case! You can also use the skeleton to make other kinds of REST APIs. For example, we made a very similar code to use Speech-To-Text via HTTP. You can use it for running any kind of script that you want to make usable for other languages and platforms, without being troubled by dependencies.

Bonus: Dockering your API

We are inside the project folder, with the pickled model, the Model.py file specifying the model class, and the app.py file containing the flask app. Now we make a Dockerfile based on a very small python image, completing a minimal and completely reproducible prediction API. Without the model requirements, the image weights approximately 160MB.

Serve Models Fast with Flask

Rest APIs with flask

Making models in a flexible manner

The complete API

Bonus: Dockering your API

Written by Estevão Uyrá Pardillos Vieira