Data science in production with Docker

A story on how to get your data science models into production, and generating business value with Docker.

Feel like skipping the story? The full code of this post is available on GitHub.

So you’ve build some amazing data science model that can predict how many M&Ms it takes to get you nauseous, that can classify the quality of coffee-beans based on their shape, or that identifies needles in pictures of haystacks. Well done! Awesome! But how are such models helping you earn money?

The biggest challenge for data scientists is not to make a model, that’s what we’re good at. The challenge is to get those models into a production environment, generating business value, and that’s where Docker comes in.

What this story is NOT about

Since I’m pretty sure you’ve heard about Docker and containers I won’t try to explain what it is and how it works. If you need more info, these are some decent links, but Google and YouTube are your friends:

Furthermore I’ll assume that you have some basic Python skills and that Docker and Python are installed on your system.


Something about architecture. Data science in production means that it needs a proper place within your companies architecture. Decide, together with your software developers, on how you’ll interact with data science models. Are you using REST? A RabbitMQ or Kafka message bus? It’s up to you, but don’t forget to sync with the software guys, as they will be your users.

There is no single best approach, and therefore I’ll use both REST and RabbitMQ in this post.

What we’ll be making

So just for demo purposes we’ll be making a Python model that lives inside a Docker container, and that accepts calls over a REST API and one that receives RPC calls over RabbitMQ. This way, other applications can use your service in production to get predictions on actual data.

Before we start…

Since this story is about using containers to run data science models in production, we’ll need a model. For demonstration purposes lets assume we have the following model (in Python):

As you can see, the trained model is stored in a separate folder called ./models so it can be used by our Docker container.

Now that we’ve trained our model (please feel free to use your own model), we have everything we need to get started.

Creating some wrapper code

First we’ll wrap our model in some code that will handle incoming requests. Often Flask (Python library) is used for this purpose but since we want to expose our model over a message bus AND a REST API, we’ll wrap it in Nameko (another Python library).

We’ll start by making a class IrisClassifierService which is a regular Python class. The class needs 2 properties, a name to identify the service, and the model we’ve trained before (read it from disk here!). Then we’ll define a predict method that takes in a list of new data points and returns a list of predictions. This method is separate so we can use it for both REST and the message bus. Finally we’ll define the method that Nameko will use. This method ( classify ) has a decorator to show Nameko that this is the method it should take.

The full example is below:

Note that this is quite a lot of code but most of it is used to handle exceptions and special cases. Since you don’t know what the endpoint will receive it’s best to consider the worst ;).

Let’s test our wrapper by running it with Nameko:

$ nameko run wrapper

… and then make a request to classify new data from another console (or use a tool like Postman):

$ curl -i -d "[[5.1, 3.5, 1.4, 0.2]]" localhost:8000/classify

And the response should look like this:

{"status": "success", "message": "Classified new input data", "predictions": [0]}

Bring out the containers

So now our model responds to requests over HTTP (we’ll do the message bus later on), nice. Lets put this into a Docker container. The first thing we’ll need to do is create a Dockerfile that contains the definition of our container. It should look something like this:

I’ve put in some comments to explain what happens but basically it’s the same as running the service on your laptop. Start with a Python installation, install requirements, make sure all code and models are in the same folder and run the service with the same command as we did before. Simple right?

Now lets use this definition to build a container. Run the following command from the command line (in the same folder as your Dockerfile, and don’t forget the dot at the end):

$ docker build -t iris_classifier_service .

This will execute the definition file and store the container with the tag “iris_classifier_service” so we can find it later. Now we can start the service with the following command:

$ docker run -d -p 8000:8000 iris_classifier_service

This will run our service with port 8000 of the container mapped to port 8000 of our Docker host (so we can access it), so let’s give it a try:

$ curl -i -d "[[5.1, 3.5, 1.4, 0.2]]" localhost:8000/classify

Note: it is possible that you need to change localhost to the IP address of your Docker host.


So that’s it! We’ve put our data science model in a Docker container, ready for deployment in a production environment, accessible for our software colleagues. Check out my GitHub repository for the full code and an example on how to convert this code to run with a message bus (RabbitMQ).


Docker compose is a neat tool that lets you define multiple services in a single file, with their dependencies, and lets you start them all at once. Lets give it a shot here.

First, we’ll need a RabbitMQ message bus. Someone has already gone through the effort of putting it in a Docker container so we can just reuse that container, no need to do anything at all. The only thing we’ll define is the ports we want to access.

Then we’ll put in our REST service (just for fun), and we’ll expose port 8000 like before. Finally, put in the slightly modified version of the service that makes use of the message bus, and define a dependency on the message bus.

Note that the compose file also contains information on how to build the images, so running this file will take care of everything we need to deploy (Please note that this is a very basic example, I would not recommend running this directly in production. Do consider security, HA, etc. when deploying your final product).

Run the following command to get it all running:

$ docker-compose up

You’ll see a lot of output, building the images and starting the containers, but after a few minutes all your services should be available. You should now be able to access the REST endpoint like before, and get predictions over the message bus. For testing the message bus interface, I recommend using the Nameko shell option:

$ nameko shell --config config.yml
In [1]: n.rpc.iris_classifier_service.classify("[[5.1, 3.5, 1.4, 0.2]]")
Out[1]: '{"status": "success", "message": "Classified input data", "predictions": [0]}'

Note: You will have to update the config.yml file to point to your message bus.