Containerization is buzz word. Everyone talks about docker and containerization. Everyone want their project’s to be containerized because of associated benefit. But challenge is there are very few who understands actually what containers are and how they can be used in Artificial Intelligence based projects.
I have attended many sessions about Docker but I couldn’t understand much and more importantly how it’s going to help in my role as Data Scientist.
Objective of this blog is to cover the information that is required to know about containers, what containers are, why they are useful and how to use them as a part of the AI based projects.
This post is doesn’t cover Dev-Ops information as I don’t know that. This post is about building a concept and understanding and doesn’t deal with syntax/commands of docker.
Application Deployment ???
Deployment involves all the activities that are required to make an Application run or use in an environment (Development, QA, Production etc).
This is one of the fundamental step during the SDLC. There are various ways of deploying an application and they evolved over the time.
To develop intuition and understanding about containers, we need to understand various ways of deployment from historical point of view.
I will try to explain based upon level of abstraction that each approach provides.
Legacy Way — Individual Server — NO ABSTRACTION
The legacy way of deploying an application is to request for an individual server to host your application. Dedicated hardware will be used to host your application.
In this approach you would need to request for machine (hardware), then you would need to setup an OS, then install software and dependencies and then put an application code on it. This is tedious and time consuming process, on a long run such servers becomes difficult to maintain, it doesn’t uses the hardware in an optimal manner, It causes issues in the Application code due to upgrade in OS and Dependencies etc.
This was not the good way of deploying an application and suffers from major drawbacks as mentioned in the figure.
Industry needed a solution for this and it gave birth to a concept called “Virtualization”.
Virutalization — Hardware Abstraction
Virtualization introduced by VM Ware was a fantastic way of addressing some of the challenges/issues raised by individual server deployment.
In this approach Hardware (Bare metal) is abstracted by the use of ‘Hypervisors’.
Hypervisors let you create and run multiple virtual (guest) machines on a piece of hardware/machine called host machine.
Hardware is shared between the multiple applications. So multiple applications can be deployed and executed on same machine.
This looks like this.
This approach has many benefits like improved hardware utilization as multiple applications can share the hardware, deployment time was reduced to few days instead of moths, comparatively easy to maintain and it was a cost effective solution. But it do suffers from cons like it’s not a truly portable approach as applications are still tied to OS and Hyervisor. Different hypervisor exist for a different platform, since each VM contains OS so they are heavy and their starts time is high so it doesn’t scale so fast.
We can take look at the below figure to understand Pros and Cons of virtualization approach.
I think main challenges are VM’s are not fully portable and another issue is they take time to start due to OS startup so it can’t scale instantaneous. In Modern application scalability is huge concern and requirement is to scale in really near time basis.
Almost all of these challenges has been solved by beautiful concept called “Containerization”.
CONTAINERS — OS ABSTRACTION — VIRTUALIZING OPERATING SYSTEM
Containers basically creates higher level of abstraction by virtualizing operating system. They breaks the coupling between application code and OS.
Due to this they are very light and contains exactly what is needed to run an application.
Containers Contains Application Code and Dependencies.
Since containers doesn’t contain OS so things like OS upgrade etc didn’t impact them. Containers are the snap shot of code and dependencies so dependency management can be implemented easily. Two different applications can use different versions of same dependencies without creating any conflict. We can also control dependency upgrades for each application.
Deployment time is few minutes for container. Another major advantage of container is they spin very fast as they don’t contain operating system. So they can be used for applications hat needs real time scaling of resources.
And they are extremely portable so same code can be ported easily without any change so a lot of rework and redundancy is avoided.
Few Benefits of containers are listed as:
So we can see a lot of benefits associated with the containers and I hope you can understand what is container and how they help them. But how to do that.
How To Create Container ?
To create container we can use a popular open source framework called “Docker”.
The overall flow for creatisng the container looks like:
Example of Containerization
Now let’s take an example of AI Application for text classification. This application has three parts:
a) Orchestration/Middleware written in Java which collect the data.
b) Data Pre-processing module in Python. Exposed as REST API. Built on libraries like numpy, spacy etc.
c) Python Classification Engine which runs a Deep learning model. Exposed as REST API. Built on Pytorch, uses word embedding etc.
To containerize this entire application, We need to ensure few things.
- Decide on number of containers. We can have one container for each module to have better handle on the application. In this case we have three modules so we can create 3 containers.
- We need some way to store the data. Each container will have their own memory but we need a way so that containers can share the data. Things like our code, configuration file and data files should be stored in such a way that all three container can access them.
This can be done by “VOLUME CONTAINER”. Volume containers are the special type of containers which can persist the data. Application Containers (code containers) can mount volume containers. All the application containers mounting same Volume Container can read/write to it. That’s how data can be shared between the container.
3. Need to bind container to the port. So Container can run it on that port.
4. Need some way so that containers can communicate with each other without any problem. We can specify a meaningful name while creating container so that they can be accessed by their names. Another thing we need to ensure is all the containers are visible to each other. For example from Orchestration, We should be able to access the pre-processing container and classification module.
This can be done using “NETWORK”. We would need to create a docker NETWORK. So all the containers under the same network can be communicate with each other. In this case they make REST calls to each other.
URI For REST Calls will look like
Overall there are two steps:
- Create a container image for each of the module by creating respective docker file. One docker file for each container.
- Use the plumbing code to build these containers and stitch them together as an application. So finally it will one Shell script for an entire application.
Sample Docker File Looks Like:
FROM ubuntu:18.04 ## Select a base image ##install python
RUN apt-get update -y && \
apt-get install -y python3 python3-pip python3-dev##copy python dependencies. they are specified in requirement.txt
COPY ./requirements.txt /app/requirements.txt## change the working directory.
WORKDIR /app##install python dependencies
RUN python3 -m pip install -r requirements.txtCOPY . /app ## start the app
ENTRYPOINT [ "python3" ]CMD [ "main.py" ]
Sample Application File:
#create network docker network create my_network# Create volume container using volume image docker run -idt --name my_volume --net=my_network my_volume:latestdocker attach IMAGE_ID## Create data pre-processing container using pre-processor imagedocker run -idt -p 9097:9097 --volumes-from my_volume --name preprocessor --net=create my_network preprocessor:latestdocker attach IMAGE_ID
I have seen confusion around Docker “run” vs “start”. Docker run is combination of build (creating a new container) and then starting it. Run needs to be executed when you want to create and run the container by one command.
If you just want to create a container then just do docker build. It will not start the container. And you can start the container with Docker “start”.
After building basic concepts regarding containers, I will suggest you to go and explore the API and command syntax for option available with Docker.
I hope this blog clears your understanding about the containers and is useful to you.
Happy Learning !!