Docker in Machine Learning
Today we are going to see a very interesting topic. In the modern world, AI plays a vital role in every domain. For instance say, the Retail business having a huge role in ML. The big giant like Walmart, Amazon, Flipkart having key features to make the products well reachable to the customers using machine learning.
Well, How is it possible for such a thing to reach the customers? Yeah here is an example, If the ‘X’ person bought a new shirt from the ‘XYZ’ brand. After a month in the market, new shirts are launched with new sort of designs. From the portal came to know those recommendations based on the user purchases. This is an example implemented using Machine Learning. To make it accessible to everyone we need the model to be deployed in a centralized place.
Is Cloud deployment alone necessary?
From the deployment perspective, we need of using any cloud services like AWS, Azure, GCP. So, How the application looks like? Do we straight away deploy from the Jupyter notebook? Those all questions are in place for deployment. We need to pipeline the product. To pipeline need to be automated for every task in the ML project. Right from the dataset to saving the prediction results. So finally we can see it as a product model. That can be deliverable to many customers using any of the cloud services.
Consider the example if the family shifts the house to another location. Their first thought will be packing the household items. Let consider two ways shifting can be done,
Packing individually: Consider the case here, where packing of items placed separately. Need to make sure all the package items are moved to respective movers. Once the family moved to another location, These things can happen ‘Hey!, I forgot where I placed my personal things.’, ‘I kept my working stuff in old house’, ’Finally! I messed up kitchen stuff with another one.’. How to avoid those situations, the second scenario helps you.
Container: Consider the case here, by packing the things placed in a separate container. Finally, we came to know what items are missed. At the end of the day, we will shift happily to another location and easily place the respective items.
Consider the Machine Learning deployment, the product needs to deploy in another customer location. If we move over the individual file to another cloud location. Certain files might be missed. Finally! The customer reported saying ‘Why the application working in one place but not in a new one ?’ Also, there will be a lot of tickets been raised from the customer side. To fix this, we need to containerize the application. This comes to the picture of ‘Docker’ implementation.
What is Docker?
Docker is a tool designed to make it easier to create, deploy and run the application using containers. Containers are used to packaging the application with all dependencies falls in one place.
Architecture
- The infrastructure is the physical server that is used to host multiple virtual machines.
- The Host OS is the base machine such as Linux or Windows. So this layer remains the same.
- Now comes the new generation which is the Docker engine. This is used to run the operating system which earlier used to be virtual machines as Docker containers.
- All of the Apps now run as Docker containers.
Docker Installation
There will be certain steps to follow for installation with respect to the operating system.
If the OS is Microsoft Windows 10 Professional or Enterprise 64-bit, or Windows 10 Home 64-bit, visit the below link and download
Once the installation is done, we can see the below screen there will be having Containers, Images & Dev environment.
Significance of Docker File
If the application needs to be dockerized need to create a docker file for the project. Let’s see deeper what actually docker file composed of.
FROM
ENV
WORKDIR
ENTRYPOINT
CMD
COPY
ADD
RUN
EXPOSE
FROM
FROM <image> [AS <name>]
FROM is used to define the base image to start the build process. Every Dockerfile must start with the FROM instruction. The idea behind this is that you need a starting point to build your image.
FROM ubuntu
It means our project requires ubuntu as a parent image.
ENV
ENV <key> <value>
This command used to set the environment variables that are required to run the project.
ENV sets the environment variables, which can be used in the Dockerfile and any scripts that it calls. These are persistent with the container too and they can be referenced at any time.
ENV HTTP_PORT="9000"
We provided HTTP_PORT as an environment variable.
WORKDIR
WORKDIR /path/to/workdir
WORKDIR tells Docker that the rest of the commands will be run in the context of the /app
folder inside the image.
WORKDIR /app
It will create the app directory in the container.
RUN
RUN has 2 forms:
RUN <command>
(shell form, the command is run in a shell, which by default is/bin/sh -c
on Linux orcmd /S /C
on Windows)RUN ["executable", "param1", "param2"]
(exec form)
The RUN
instruction will execute any commands in a new layer on top of the current image and commit the results. The resulting committed image will be used for the next step in the Dockerfile
.
The RUN command runs within the container at build time.
RUN /bin/bash -c 'source $HOME/.bashrc; echo $HOME'
ENTRYPOINT
ENTRYPOINT has two forms:
ENTRYPOINT ["executable", "param1", "param2"]
(exec form, preferred)ENTRYPOINT command param1 param2
(shell form)
An ENTRYPOINT
allows you to configure a container that will run as an executable.
ENTRYPOINT sets the command and parameters that will be executed first when a container is run. Any command-line arguments passed to docker run <image>
will be appended to the ENTRYPOINT command, and will override all elements specified using CMD
. For example, docker run <image> bash
we will add the command argument bash to the end of the ENTRYPOINT command.
You can override ENTRYPOINT instructions using the docker run --entrypoint
ENTRYPOINT [ "sh", "-c", "echo $HOME" ]
If the ENTRYPOINT isn’t specified, Docker will use /bin/sh -c as the default executor.
CMD
The CMD
instruction has three forms:
CMD ["executable","param1","param2"]
(exec form, this is the preferred form)CMD ["param1","param2"]
(as default parameters to ENTRYPOINT)CMD command param1 param2
(shell form)
The main purpose of a CMD is to provide defaults when executing a container. These will be executed after the entry point.
In Dockerfiles, you can define CMD
defaults that include an executable.
COPY
COPY has two forms:
COPY <src>... <dest>
COPY ["<src>",... "<dest>"]
(this form is required for paths containing whitespace)
The COPY command is used to copy one or many local files or folders from source and adds them to the filesystem of the containers at the destination path.
It builds up the image in layers, starting with the parent image, defined using FROM
.The Docker instruction WORKDIR defines a working directory for the COPY instructions that follow it.
The <dest>
is an absolute path, or a path relative to WORKDIR
, into which the source will be copied inside the destination container.
COPY test relativeDir/ # adds "test" to `WORKDIR`/relativeDir/
COPY test /absoluteDir/ # adds "test" to /absoluteDir/
ADD
ADD has two forms:
ADD <src>... <dest>
ADD ["<src>",... "<dest>"]
The ADD command is used to add one or many local files or folders from the source and adds them to the filesystem of the containers at the destination path.
It is Similar to COPY command but it has some additional features:
- If the source is a local tar archive in a recognized compression format, then it is automatically unpacked as a directory into the Docker image.
- If the source is a URL, then it will download and copy the file into the destination within the Docker image. However, Docker discourages using ADD for this purpose.
ADD rootfs.tar.xz /
ADD http://example.com/big.tar.xz /usr/src/things/
EXPOSE
EXPOSE <port> [<port>/<protocol>...]
The EXPOSE
command informs the Docker that the container listens on the specified network ports at runtime. You can specify whether the port listens on TCP or UDP, and the default is TCP if the protocol is not specified.
But EXPOSE will not allow communication via the defined ports to containers outside of the same network or to the host machine. To allow this to happen you need to publish the ports.
The EXPOSE command does not actually publish the port. To actually publish the port when running the container, use the -p
flagon docker run
to publish and map one or more ports, or the -P
flag to publish all exposed ports and map them to high-order ports.
Zomato price prediction
Let’s see an example of predicting the Zomato price for the restaurants using docker implementation.
The below docker file created for the implementation.
#specify the parent base image which is the python version 3.7
FROM python:3.8
# This prevents Python from writing out pyc files
ENV PYTHONDONTWRITEBYTECODE 1
# This keeps Python from buffering stdin/stdout
ENV PYTHONUNBUFFERED 1
# install system dependencies
RUN apt-get update \
&& apt-get -y install gcc make \
&& rm -rf /var/lib/apt/lists/*
# install dependencies
RUN pip install --no-cache-dir --upgrade pip
# set work directory
WORKDIR /src/app
# copy requirements.txt
COPY ./requirements.txt /src/app/requirements.txt
# install project requirements
RUN pip install --no-cache-dir -r requirements.txt
# copy project
COPY . .
# Generate pikle file
#WORKDIR /src/app/ML_Model
#RUN python model.py
# set work directory
WORKDIR /src/app
# set app port
EXPOSE 8080
ENTRYPOINT [ "python" ]
# Run app.py when the container launches
CMD [ "app.py","run","--host","0.0.0.0"]
Docker Confirmation
To verify the docker installation, try in command prompt as docker — version
The next step is good to go by building the docker file as below.
Build using Docker file
Open the command prompt and point the directory to the project workspace as below,
Try the command as ‘docker build -t <<name>> .’ Once the command provided the docker generate the build as per DockerFile implementation.
Open the docker application and navigate to images. Verify the ‘zomato-price-predict’ image created.
Run the Docker
Try the below command the name symbolizes a reference to the application. This will be shown in the docker container section.
docker run --name deployML -p 8080:8080 zomato-price-predict
After executing the command, the app is running.
Verify-in container section as below,
Finally, we can see the application is running
We can see the application logs too.
Hope this article gives you a better idea of implementing docker in Machine Learning.
Hope you enjoyed my article !!