Understanding Docker -Containerizing ML Projects
My name is Kartik Chauhan and I am a member of the Artificial Intelligence and Machine Learning club of KIET Group of Institutions , Ghaziabad. I am pursuing B.Tech in Computer Science and Engineering .This article explains the fundamental concepts of docker and how we can build and run a containerized Machine Learning Application from scratch.
What is docker and why is it so important ?
Let us understand what docker really is without getting into the fancy definition that is already available all over the internet.
Docker is an open source project for building, shipping and running software programs. It is a tool that solves common software problems and simplifies the experience of installing, running, publishing and removing software, and for doing so it uses a operating system technology that is a part of the LINUX operating system -‘Containers’ .

Understanding Containers:
Simply putting, containers are isolated runtime environments that contain all the resources that a application program needs to run properly, including libraries, databases and all the required dependencies.
Containers are used for process isolation and platform independent computing.
Containers are a part of the UNIX- kernel for a long time, still they were out of the reach of most of the people because of the difficulty of building one manually. Building and running programs in a container is a complex task and easy to do incorrectly.
Docker has changed the history of container development process and these containers are now readily available to all the users just by using some simple docker commands.
Docker Containers v/s Virtual Machines:
Virtual machines uses hardware virtualization i.e. it uses hardware abstraction. We have to first install a complete operating system on our host OS to create a process isolation environment.
Docker containers on the other hand, uses the container technology already build in the kernel of LINUX based operating systems. Containers are just an operating system feature, a powerful technology that avoids the need of hardware based virtualization .
So now we know how docker provides process isolation and platform independent computing. But what about distribution of software ?
Docker Image:
The easy shipping/distribution of docker containers is achieved by using docker images. The docker image is a single file that contains all the binaries and files that are required for creating a container. We can create as many containers as we want from a single docker image. A container is a running instance of a image. Docker images are distributed using ‘registries’ and ‘indexes’ . All the public images are are available on ‘DockerHub’ in the form of ‘registries’. You can also create your own images and upload them on ‘DockerHub’. A registry on DockerHub is similar to repository on GitHub.

Importance of Docker Containers:
Working as a software developer is a complex task as you are not only responsible for building complex software but you are also responsible for platform independent and easy distribution of your software. Often these complex software comes with a lot of system dependencies like host OS, libraries, database etc and in order to install these software on a server or a local user computer, all the require dependencies need to be fulfilled.
So, instead of manually handling these dependencies every single time an installation is needed, we as software developers can use docker containers to ship the software along with all the dependencies in the form of a containerized application preventing buggy installation and server crashes.

Now that we understand the fundamentals of docker, it’s time to review some basic docker commands than then we build a full containerized machine learning application. I will assume that you already have docker installed in you machine as installing docker is not the focus of this article.
In Linux bases machines you can use terminal, however if you are running docker toolbox on Windows ( like i am), then you will use docker quick-start terminal for all the commands.
Docker - Hello World!!
In this example we will download the nginx server image from DockerHub and and run a container using that nginx image on our local machine to get the understanding of basic docker commands.
Launching Docker Quickstart Terminal:

1- Downloading Image and Running A Container From Dockerhub:

In the above command :
- We downloaded the nginx image from Dockerhub.
- We started a container named nginx based on the nginx image
- Our container is listening at port 80 (-p 80:80)
- We can go to https://localhost:80/ and see our server up in a docker container.

2- Listing all the running container:
command: docker container ls
We can easily run multiple containers on the same image or different images together, and can list them with this command.

We can see that a single container is running on the image: nginx and listening at port: 80
3- Listing all the locally available images:
command: docker image ls

The above command lists all the available image that are already downloaded/cached in you local machine. It is a good point to note that the three different nginx images have the same IMAGE ID. It means that these images are only cached once, these are the same image with different tags.They only occupy a total of 127 MB of storage and are not stored in memory three times. It is a way in which docker saves memory in your system. Only the change you made in the existing image is stacked on the existing image under the new tag name, and it does not stores the whole image again. So there is a stack of multiple snapshots in a docker image . Docker is awesome!!
Files are stored in a docker image in a union file-system i.e. when you read a file, it will be read from the topmost layer where it exists. The read will fall through the layers until it reaches a layer where that file does exist.

4- Stopping A Running Container:
command: docker container stop <container name/ID>

We can see that now we don’t have any running container.If we go back to our localhost , then we can see that our nginx server has stopped:

We can start this container again using the command :
docker container start nginx
5- Deleting A container:
We can permanently delete a container using the command:
docker container rm <name/id>

Awesome! Now we know the difference between images and container and how to download images from Dockerhub and rum container from images. Let’s take a moment to go through this diagram that illustrates this whole process very clearly before we start building our own docker images.

Building Our Own Images With Dockerfile :
In this last section of the article, we will see how we can create custom docker images for our applications and how we can use them to build containers for our applications using Dockerfile and Docker-Compose.
Dockerfile:
A Dockerfile is a text file that contains the instructions/ recipe for building an image. Docker builds an image by executing the instructions in a Dockerfile one by one. Docker-Compose is a tool for defining and running multi-container Docker applications.
Docker Image Building Pipeline:

Building An Similarity Checking API In FLASK:
We will use an similarity web API developed in flask. With the help of that API we will demonstrate how we can create a docker image and how we can run that API in a Docker container. We will be using the nltk library for our natural language processing work.
In your local machine create a new project directory as the following:
flask_project
|__webapp
|__ Dockerfile
|__app.py
|__requirements.txt
|__templates
|_form.html|__docker-compose.yml
1- Building Project Structure Using Docker Quickstart Terminal:
An example of how to create a project structure using docker:

2- Now you may use any text editor of your choice to build the API. In my case, i will be using ATOM:

3- Writing Dockerfile (webapp directory):

The FROM commands gives the address of the python image from Dockerhub for us. When we create a container from this image, it downloads this python image for us. The WORKDIR command changes the default directory for working. Now we can copy the requirements.txt file that contains all the required dependencies. Now the pip install command installs all the dependencies/ libraries mentioned in requirements.txt .Then we can install our NLP model in container using pip command. Then we run app.py file in our container.
4- Writing requirements.txt file (webapp directory):

5 - Writing Docker-Compose file:

Docker-compose is used for finally building an image from Dockerfile. It also provides efficient and easy linking and networking between multiple services that are required for proper functioning of a docker container. It is a tool for defining and running multi-container Docker applications.
6- Building RESTful API (app.py file):
from flask import Flask, request, render_template
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from nltk.corpus import stopwordsset(stopwords.words('english'))
app = Flask(__name__)@app.route('/')
def my_template():
return render_template('form.html')@app.route('/', methods=['POST'])
def my_template_post():
stop_words = stopwords.words('english')
text1 = request.form['text1'].lower()
text2 = request.form['text2'].lower()
document_1 = ' '.join([word for word in text1.split() if word not in stop_words])
document_2 = ' '.join([word for word in text2.split() if word not in stop_words])
corpus = [document_1, document_2]
vectorizer = TfidfVectorizer()
tfidf = vectorizer.fit_transform(corpus)
similarity_matrix = cosine_similarity(tfidf)[0,1]
return render_template('form.html', final=similarity_matrix, text1=text1, text2=text2)if __name__ == '__main__':
app.run(host="127.0.0.1", port = 80, debug=True, threaded=True)####################################################################
form.html file
####################################################################<!DOCTYPE html>
<html>
<head>
<title> DockerWeb </title>
<style type="text/css">
#main-header{
text-align: center;
background-color: black;
color: orange;
padding: 10px;
}
#body{
background-color: gray;
color: black;
}
#input-button{
background-color: black;
padding: 10px;
color: orange;
}
#Doc-header{
color: black;
padding: 10px;
}
</style>
</head>
<body id="body">
<header id="main-header">
<h1>Welcome To This Containerized NLP Application.</h1>
</header>
<br>
<h1>Enter Two Sentences.</h1>
<form method="POST">
<h2 id='Doc-header'>
Document 1: <textarea name="text1" rows="5" cols="80"></textarea><br><br>
Document 2: <textarea name="text2" rows="5" cols="80"></textarea><br><br>
</h2>
<input id="input-button" type="submit">
</form>
{% if final %}
<h2>The Cosine Similarity Between two texts is {{final}}!</h2>
{% else %}
<h2></h2>
{% endif %}
</body>
</html>
7-Building Docker Image using Docker-Compose:
Command: docker-compose build

You will see a similar screen after running the build command. You can see that it is building the image Step by Step, from executing the associated Dockerfile.
After the build is complete, you wee see a message like this one :

The image is tagged as — flask_docker_web_app:latest
8- Running our container from the image we created:
Command: docker container run -d -p 80:80 --name lang flask_docker_web_app

We can see our container named as lang from the image we created. Now our API is running inside our container with all the required dependencies installed in the container.
9- Listing all the locally cached images in our machine:
docker image ls

We can see that the Docker image we created named as flask_docker_web_app and it is stored in our machine. We can now distribute this image docker hub.
10- Running our container and testing our web application.

Our application is working properly.
I hope you enjoyed reading this blog, thank you for investing your time in it.
Follow me on GitHub: github.com/kartik4020