Deploy your Machine Learning model as an API in 5 minutes (with Docker and Flask)
As a data scientist, do you want people to use your model? Do you want to deploy the result of weeks or months of hard work? Please read below.
Let’s put your work in production by using a reliable way to ensure quick deployment by exposing a web API.
There is a git repository associated to this tutorial:
https://github.com/Dataswati/garage/tree/master/docker_flask_api_5mn
You will need to install:
docker
https://docs.docker.com/docker-compose
https://docs.docker.com/compose/
Model
Let’s quickly find an operational model to deploy. Thanks to my google-fu, I found this nice tutorial on predicting house prices using open data. Here are the links to the tutorial and corresponding git repository.
In the tutorial, gradient boosting implementation from scikit-learn was used to predict house prices. To make the deployment easier, I reduced the number of input features (stay tuned for a more advanced article on robust feature selection, hit that follow button now!).
On the internet, we often deal with json format. Its counterpart in python is our beloved dict. In this basic example, we will handle python dict directly in the prediction function, but please, note that this isn’t the neatest way to structure the code (one should separate the data munging and the prediction).
Flask
Flask is a great python module to create API server using decorators to associate functions to URL routes.
The documentation is nice and it is easy to get started with flask: http://flask.pocoo.org/
In the script below, we will run the flask server. I’ve created the route named predict_price that receives json data and then applies the predict function and returns the prediction (also in json format).
We could stop here because the command python3 server.py
is enough to start the flask server. However, we prefer to embed it in a docker container to maintain portability wherever we may roam.
Docker
Below is the Dockerfile
, a magical instruction file that defines all that is needed to run the code.
As we can see, it is based on Ubuntu that has our beloved apt
command for package management.
The file python_requirements.txt
contains all the python packages. You can generate your own requirements if you have additional dependencies using pip3 freeze
(and maybe filtering useful packages). In this example, the file is very short:
ENTRYPOINT
tells what command to launch at the start of the container. Here, the command is bash /scripts/start_flask.sh
And now, you are ready to launch this docker and get rich. To make it look more polished, we prefer to use the docker-compose.yml
file that is used to set up different dockers:
build:
builds the Dockerfile in current folder.volumes:
links folders on your machine with folders inside docker container.port:
exposes the docker network through the corresponding port on the host machine.
Start everything
To start everything, you just need to be in the folder docker
and execute : docker-compose up
. After the installation that can take few minutes, you will see:
Starting docker_flask_1 ... done
Attaching to docker_flask_1
flask_1 | * Serving Flask app "server" (lazy loading)
flask_1 | * Environment: production
flask_1 | WARNING: Do not use the development server in a production environment.
flask_1 | Use a production WSGI server instead.
flask_1 | from numpy.core.umath_tests import inner1d
flask_1 | * Debugger is active!
flask_1 | * Debugger PIN: 161-492-662
This means that everything is fine and that the flask server is listening.
You can use curl
to connect to the server to verify that the service is available:
curl -X POST -H "Content-Type: application/json" -d @to_predict_json.json http://localhost:5000/predict_price
where to_predict.json contains:
{"grade":10.0,"lat":47.5396,"long":-122.073,"sqft_living":4490.0,"waterfront":0.0,"yr_built":2006.0}
this should return:
{
"predict cost": 982545.2768445385
}
Here, you will likely see a slightly different number because of the randomness of the gradient boosting model (you can also fix a seed to have deterministic results).
In the next episode, we’ll go in more details on different topics that were only scratched here: setting up a robust server with nginx and gunicorn, professional pipelines and optimized computations (to justify a higher salary), different approaches to feature selection and hyperparameter tuning (when your client doesn’t want to pay for a decent machine and your resources are scarce and limited).