Containers and Dockers For Data Scientist

Tanveer Khan
Oct 13, 2020 · 7 min read
Typical Situation

Introduction

Containerization is buzz word. Everyone talks about docker and containerization. Everyone want their project’s to be containerized because of associated benefit. But challenge is there are very few who understands actually what containers are and how they can be used in Artificial Intelligence based projects.

I have attended many sessions about Docker but I couldn’t understand much and more importantly how it’s going to help in my role as Data Scientist.

Objective of this blog is to cover the information that is required to know about containers, what containers are, why they are useful and how to use them as a part of the AI based projects.

This post is doesn’t cover Dev-Ops information as I don’t know that. This post is about building a concept and understanding and doesn’t deal with syntax/commands of docker.

Application Deployment ???

Deployment involves all the activities that are required to make an Application run or use in an environment (Development, QA, Production etc).

This is one of the fundamental step during the SDLC. There are various ways of deploying an application and they evolved over the time.

To develop intuition and understanding about containers, we need to understand various ways of deployment from historical point of view.

I will try to explain based upon level of abstraction that each approach provides.

Legacy Way — Individual Server — NO ABSTRACTION

The legacy way of deploying an application is to request for an individual server to host your application. Dedicated hardware will be used to host your application.

Legacy Approach For Deployment

In this approach you would need to request for machine (hardware), then you would need to setup an OS, then install software and dependencies and then put an application code on it. This is tedious and time consuming process, on a long run such servers becomes difficult to maintain, it doesn’t uses the hardware in an optimal manner, It causes issues in the Application code due to upgrade in OS and Dependencies etc.

This was not the good way of deploying an application and suffers from major drawbacks as mentioned in the figure.

Industry needed a solution for this and it gave birth to a concept called “Virtualization”.

Virutalization — Hardware Abstraction

Virtualization introduced by VM Ware was a fantastic way of addressing some of the challenges/issues raised by individual server deployment.

In this approach Hardware (Bare metal) is abstracted by the use of ‘Hypervisors’.

Hypervisors let you create and run multiple virtual (guest) machines on a piece of hardware/machine called host machine.

Hardware is shared between the multiple applications. So multiple applications can be deployed and executed on same machine.

This looks like this.

Virtualization — 1

This approach has many benefits like improved hardware utilization as multiple applications can share the hardware, deployment time was reduced to few days instead of moths, comparatively easy to maintain and it was a cost effective solution. But it do suffers from cons like it’s not a truly portable approach as applications are still tied to OS and Hyervisor. Different hypervisor exist for a different platform, since each VM contains OS so they are heavy and their starts time is high so it doesn’t scale so fast.

We can take look at the below figure to understand Pros and Cons of virtualization approach.

I think main challenges are VM’s are not fully portable and another issue is they take time to start due to OS startup so it can’t scale instantaneous. In Modern application scalability is huge concern and requirement is to scale in really near time basis.

Almost all of these challenges has been solved by beautiful concept called “Containerization”.

CONTAINERS — OS ABSTRACTION — VIRTUALIZING OPERATING SYSTEM

Containers basically creates higher level of abstraction by virtualizing operating system. They breaks the coupling between application code and OS.

Due to this they are very light and contains exactly what is needed to run an application.

Containers Contains Application Code and Dependencies.

Since containers doesn’t contain OS so things like OS upgrade etc didn’t impact them. Containers are the snap shot of code and dependencies so dependency management can be implemented easily. Two different applications can use different versions of same dependencies without creating any conflict. We can also control dependency upgrades for each application.

Deployment time is few minutes for container. Another major advantage of container is they spin very fast as they don’t contain operating system. So they can be used for applications hat needs real time scaling of resources.

And they are extremely portable so same code can be ported easily without any change so a lot of rework and redundancy is avoided.

Few Benefits of containers are listed as:

Container Benefits

So we can see a lot of benefits associated with the containers and I hope you can understand what is container and how they help them. But how to do that.

How To Create Container ?

To create container we can use a popular open source framework called “Docker”.

The overall flow for creatisng the container looks like:

Container Creation & deployment Process

Example of Containerization

Now let’s take an example of AI Application for text classification. This application has three parts:

a) Orchestration/Middleware written in Java which collect the data.

b) Data Pre-processing module in Python. Exposed as REST API. Built on libraries like numpy, spacy etc.

c) Python Classification Engine which runs a Deep learning model. Exposed as REST API. Built on Pytorch, uses word embedding etc.

To containerize this entire application, We need to ensure few things.

  1. Decide on number of containers. We can have one container for each module to have better handle on the application. In this case we have three modules so we can create 3 containers.
  2. We need some way to store the data. Each container will have their own memory but we need a way so that containers can share the data. Things like our code, configuration file and data files should be stored in such a way that all three container can access them.

This can be done by “VOLUME CONTAINER”. Volume containers are the special type of containers which can persist the data. Application Containers (code containers) can mount volume containers. All the application containers mounting same Volume Container can read/write to it. That’s how data can be shared between the container.

3. Need to bind container to the port. So Container can run it on that port.

4. Need some way so that containers can communicate with each other without any problem. We can specify a meaningful name while creating container so that they can be accessed by their names. Another thing we need to ensure is all the containers are visible to each other. For example from Orchestration, We should be able to access the pre-processing container and classification module.

This can be done using “NETWORK”. We would need to create a docker NETWORK. So all the containers under the same network can be communicate with each other. In this case they make REST calls to each other.

URI For REST Calls will look like

https://container_name:bind_port/api_endpoint

Containerized Architecture

Overall there are two steps:

  1. Create a container image for each of the module by creating respective docker file. One docker file for each container.
  2. Use the plumbing code to build these containers and stitch them together as an application. So finally it will one Shell script for an entire application.
Container Build Process full

Sample Docker File Looks Like:

FROM ubuntu:18.04     ## Select a base image                                          ##install python
RUN apt-get update -y && \
apt-get install -y python3 python3-pip python3-dev
##copy python dependencies. they are specified in requirement.txt
COPY ./requirements.txt /app/requirements.txt
## change the working directory.
WORKDIR /app
##install python dependencies
RUN python3 -m pip install -r requirements.txt
COPY . /app ## start the app
ENTRYPOINT [ "python3" ]
CMD [ "main.py" ]

Sample Application File:

#create network                       docker network create my_network# Create volume container using volume image                      docker run -idt --name my_volume --net=my_network my_volume:latestdocker attach IMAGE_ID## Create data pre-processing container using pre-processor imagedocker run -idt -p 9097:9097  --volumes-from my_volume --name preprocessor --net=create my_network preprocessor:latestdocker attach IMAGE_ID

I have seen confusion around Docker “run” vs “start”. Docker run is combination of build (creating a new container) and then starting it. Run needs to be executed when you want to create and run the container by one command.

If you just want to create a container then just do docker build. It will not start the container. And you can start the container with Docker “start”.

After building basic concepts regarding containers, I will suggest you to go and explore the API and command syntax for option available with Docker.

I hope this blog clears your understanding about the containers and is useful to you.

Happy Learning !!

AI For Real

This blog would be maintained by Tanveer Khan and Sujoy…

AI For Real

This blog would be maintained by Tanveer Khan, Sujoy Roychowdhury, both senior data scientists at IBM. We would use our experience to talk about building real-world AI systems.

Tanveer Khan

Written by

Sr. Data Scientist with strong hands-on experience in building Real World Artificial Intelligence Based Solutions using NLP, Computer Vision and Edge Devices.

AI For Real

This blog would be maintained by Tanveer Khan, Sujoy Roychowdhury, both senior data scientists at IBM. We would use our experience to talk about building real-world AI systems.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store