Understanding Docker -Containerizing ML Projects

Kartik Chauhan
May 26, 2020 · 9 min read

My name is Kartik Chauhan and I am a member of the Artificial Intelligence and Machine Learning club of KIET Group of Institutions , Ghaziabad. I am pursuing B.Tech in Computer Science and Engineering .This article explains the fundamental concepts of docker and how we can build and run a containerized Machine Learning Application from scratch.

What is docker and why is it so important ?

Let us understand what docker really is without getting into the fancy definition that is already available all over the internet.

Docker is an open source project for building, shipping and running software programs. It is a tool that solves common software problems and simplifies the experience of installing, running, publishing and removing software, and for doing so it uses a operating system technology that is a part of the LINUX operating system -‘Containers’ .

Image for post
Image for post

Simply putting, containers are isolated runtime environments that contain all the resources that a application program needs to run properly, including libraries, databases and all the required dependencies.

Containers are used for process isolation and platform independent computing.

Containers are a part of the UNIX- kernel for a long time, still they were out of the reach of most of the people because of the difficulty of building one manually. Building and running programs in a container is a complex task and easy to do incorrectly.

Docker has changed the history of container development process and these containers are now readily available to all the users just by using some simple docker commands.

Virtual machines uses hardware virtualization i.e. it uses hardware abstraction. We have to first install a complete operating system on our host OS to create a process isolation environment.

Docker containers on the other hand, uses the container technology already build in the kernel of LINUX based operating systems. Containers are just an operating system feature, a powerful technology that avoids the need of hardware based virtualization .

So now we know how docker provides process isolation and platform independent computing. But what about distribution of software ?

The easy shipping/distribution of docker containers is achieved by using docker images. The docker image is a single file that contains all the binaries and files that are required for creating a container. We can create as many containers as we want from a single docker image. A container is a running instance of a image. Docker images are distributed using ‘registries’ and ‘indexes’ . All the public images are are available on ‘DockerHub’ in the form of ‘registries’. You can also create your own images and upload them on ‘DockerHub’. A registry on DockerHub is similar to repository on GitHub.

Image for post
Image for post

Working as a software developer is a complex task as you are not only responsible for building complex software but you are also responsible for platform independent and easy distribution of your software. Often these complex software comes with a lot of system dependencies like host OS, libraries, database etc and in order to install these software on a server or a local user computer, all the require dependencies need to be fulfilled.

So, instead of manually handling these dependencies every single time an installation is needed, we as software developers can use docker containers to ship the software along with all the dependencies in the form of a containerized application preventing buggy installation and server crashes.

Image for post
Image for post

Now that we understand the fundamentals of docker, it’s time to review some basic docker commands than then we build a full containerized machine learning application. I will assume that you already have docker installed in you machine as installing docker is not the focus of this article.

In Linux bases machines you can use terminal, however if you are running docker toolbox on Windows ( like i am), then you will use docker quick-start terminal for all the commands.

Docker - Hello World!!

In this example we will download the nginx server image from DockerHub and and run a container using that nginx image on our local machine to get the understanding of basic docker commands.

Image for post
Image for post
Docker Quickstart terminal
Image for post
Image for post

In the above command :

  • We downloaded the nginx image from Dockerhub.
  • We started a container named nginx based on the nginx image
  • Our container is listening at port 80 (-p 80:80)
  • We can go to https://localhost:80/ and see our server up in a docker container.
Image for post
Image for post
This is what we will see at https://localhost:80/
command: docker container ls

We can easily run multiple containers on the same image or different images together, and can list them with this command.

Image for post
Image for post

We can see that a single container is running on the image: nginx and listening at port: 80

command: docker image ls
Image for post
Image for post

The above command lists all the available image that are already downloaded/cached in you local machine. It is a good point to note that the three different nginx images have the same IMAGE ID. It means that these images are only cached once, these are the same image with different tags.They only occupy a total of 127 MB of storage and are not stored in memory three times. It is a way in which docker saves memory in your system. Only the change you made in the existing image is stacked on the existing image under the new tag name, and it does not stores the whole image again. So there is a stack of multiple snapshots in a docker image . Docker is awesome!!

Files are stored in a docker image in a union file-system i.e. when you read a file, it will be read from the topmost layer where it exists. The read will fall through the layers until it reaches a layer where that file does exist.

Image for post
Image for post
Reading A File From Layer — 0
command: docker container stop <container name/ID>
Image for post
Image for post

We can see that now we don’t have any running container.If we go back to our localhost , then we can see that our nginx server has stopped:

Image for post
Image for post

We can start this container again using the command :

docker container start nginx

We can permanently delete a container using the command:

docker container rm <name/id>
Image for post
Image for post

Awesome! Now we know the difference between images and container and how to download images from Dockerhub and rum container from images. Let’s take a moment to go through this diagram that illustrates this whole process very clearly before we start building our own docker images.

Image for post
Image for post

Building Our Own Images With Dockerfile :

In this last section of the article, we will see how we can create custom docker images for our applications and how we can use them to build containers for our applications using Dockerfile and Docker-Compose.

A Dockerfile is a text file that contains the instructions/ recipe for building an image. Docker builds an image by executing the instructions in a Dockerfile one by one. Docker-Compose is a tool for defining and running multi-container Docker applications.

Image for post
Image for post

We will use an similarity web API developed in flask. With the help of that API we will demonstrate how we can create a docker image and how we can run that API in a Docker container. We will be using the nltk library for our natural language processing work.

In your local machine create a new project directory as the following:

flask_project
|__webapp
|__ Dockerfile
|__app.py
|__requirements.txt
|__templates
|_form.html
|__docker-compose.yml

1- Building Project Structure Using Docker Quickstart Terminal:

An example of how to create a project structure using docker:

Image for post
Image for post

2- Now you may use any text editor of your choice to build the API. In my case, i will be using ATOM:

Image for post
Image for post
Launching Atom From terminal

3- Writing Dockerfile (webapp directory):

Image for post
Image for post

The FROM commands gives the address of the python image from Dockerhub for us. When we create a container from this image, it downloads this python image for us. The WORKDIR command changes the default directory for working. Now we can copy the requirements.txt file that contains all the required dependencies. Now the pip install command installs all the dependencies/ libraries mentioned in requirements.txt .Then we can install our NLP model in container using pip command. Then we run app.py file in our container.

4- Writing requirements.txt file (webapp directory):

Image for post
Image for post

5 - Writing Docker-Compose file:

Image for post
Image for post

Docker-compose is used for finally building an image from Dockerfile. It also provides efficient and easy linking and networking between multiple services that are required for proper functioning of a docker container. It is a tool for defining and running multi-container Docker applications.

6- Building RESTful API (app.py file):

from flask import Flask, request, render_template
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from nltk.corpus import stopwords
set(stopwords.words('english'))
app = Flask(__name__)
@app.route('/')
def my_template():
return render_template('form.html')
@app.route('/', methods=['POST'])
def my_template_post():
stop_words = stopwords.words('english')
text1 = request.form['text1'].lower()
text2 = request.form['text2'].lower()
document_1 = ' '.join([word for word in text1.split() if word not in stop_words])
document_2 = ' '.join([word for word in text2.split() if word not in stop_words])
corpus = [document_1, document_2]
vectorizer = TfidfVectorizer()
tfidf = vectorizer.fit_transform(corpus)
similarity_matrix = cosine_similarity(tfidf)[0,1]
return render_template('form.html', final=similarity_matrix, text1=text1, text2=text2)
if __name__ == '__main__':
app.run(host="127.0.0.1", port = 80, debug=True, threaded=True)
####################################################################
form.html file
####################################################################
<!DOCTYPE html>
<html>
<head>
<title> DockerWeb </title>
<style type="text/css">
#main-header{
text-align: center;
background-color: black;
color: orange;
padding: 10px;
}
#body{
background-color: gray;
color: black;
}
#input-button{
background-color: black;
padding: 10px;
color: orange;
}
#Doc-header{
color: black;
padding: 10px;
}
</style>
</head>
<body id="body">
<header id="main-header">
<h1>Welcome To This Containerized NLP Application.</h1>
</header>
<br>
<h1>Enter Two Sentences.</h1>
<form method="POST">
<h2 id='Doc-header'>
Document 1: <textarea name="text1" rows="5" cols="80"></textarea><br><br>
Document 2: <textarea name="text2" rows="5" cols="80"></textarea><br><br>
</h2>
<input id="input-button" type="submit">
</form>
{% if final %}
<h2>The Cosine Similarity Between two texts is {{final}}!</h2>
{% else %}
<h2></h2>
{% endif %}
</body>
</html>

7-Building Docker Image using Docker-Compose:

Command: docker-compose build
Image for post
Image for post

You will see a similar screen after running the build command. You can see that it is building the image Step by Step, from executing the associated Dockerfile.

After the build is complete, you wee see a message like this one :

Image for post
Image for post

The image is tagged as — flask_docker_web_app:latest

8- Running our container from the image we created:

Command: docker container run -d -p 80:80 --name lang flask_docker_web_app
Image for post
Image for post

We can see our container named as lang from the image we created. Now our API is running inside our container with all the required dependencies installed in the container.

9- Listing all the locally cached images in our machine:

docker image ls
Image for post
Image for post

We can see that the Docker image we created named as flask_docker_web_app and it is stored in our machine. We can now distribute this image docker hub.

10- Running our container and testing our web application.

Image for post
Image for post

Our application is working properly.

I hope you enjoyed reading this blog, thank you for investing your time in it.

Follow me on GitHub: github.com/kartik4020

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Kartik Chauhan

Written by

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Kartik Chauhan

Written by

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store