From Code to Containers: Revolutionizing Data Science with Docker
Imagine if every time you wanted to share your amazing data science project, you had to deal with software compatibility issues, missing dependencies, and the dread of “it works on my machine.” Enter Docker, your project’s new best friend. Docker simplifies data science by packaging your code, libraries, and environment into a neat, portable container.
It’s like a magic box that holds everything your project needs, making it work seamlessly on any computer. No more setup headaches, no more version clashes — just smooth sailing as you explore the exciting seas of data science.
What is Docker ?
Docker is the platform which packages an application and its dependencies together in the form of containers.
What problems Docker solve ?
Imagine this: you’re a student with your machine learning project ready to shine. You’ve invested hours crafting your masterpiece, and today, it’s finally working flawlessly on your personal computer. But here’s the twist — the big day arrives, and you try running your project on the college computer. Bam! It crashes due to mismatched dependencies.
But wait, enter Docker! With Docker, you could have bundled your entire project, including all the libraries and dependencies, into a self-contained container. It’s like packaging your play’s set, costumes, and actors into a magic box that you can open anywhere — whether it’s your computer, a college lab machine, or even a friend’s laptop. No more unexpected crashes due to differences in environments. Docker casts a spell of consistency, making sure your project runs the same, no matter where it’s performed. So, next time, when the curtain rises on your machine learning project, Docker ensures that the show goes on without a hitch.
Docker vs Virtual Machine
- Architecture:
- Docker Containers: Think of containers as individual apartments in a shared building. Each apartment (container) has its own space (like a separate bedroom), but they all share common areas like the hallway and the kitchen.
- Virtual Machines: Imagine having multiple full-fledged houses, each with its own kitchen, bedroom, and living room. These houses (virtual machines) are completely separate from each other.
2. Efficiency:
- Docker Containers: Containers are like small apartments, so they’re efficient in using space. They use fewer resources because they share the same common areas.
- Virtual Machines: Houses are larger and need more resources since they have complete rooms, including kitchens and bathrooms.
3. Isolation:
- Docker Containers: While apartments share common areas, each apartment still has its own private space. Similarly, containers share the host system’s resources but maintain their own isolated environments.
- Virtual Machines: Houses are completely separate, just like virtual machines that run their own operating systems independently.
4. Startup Time:
- Docker Containers: Apartments are quicker to set up because they’re smaller and need fewer things. Similarly, containers start quickly because they only include the application and its dependencies.
- Virtual Machines: It takes more time to set up a whole house, just like virtual machines take longer to start because they need to boot an entire operating system.
5. Portability:
- Docker Containers: Imagine if you could pack up your apartment and move it anywhere. Containers are like that — they package everything you need, making them easy to move around.
- Virtual Machines: Houses are less portable since they’re bigger and need more specific arrangements. Similarly, virtual machines are less portable due to their larger size.
6. Consistency:
- Docker Containers: If you shared the same design for all apartments in a building, they would be consistent. Containers provide consistency by sharing the same basic structure but allowing customization inside.
- Virtual Machines: Houses can have completely different layouts, just like virtual machines can have different operating systems and configurations.
7. Use Cases:
- Docker Containers: Imagine an apartment complex tailored for people who need shared spaces and quick access. Containers are great for microservices, where different parts of an application need to work together efficiently.
- Virtual Machines: Houses are like individual residences for people who need complete privacy and separate living spaces. Virtual machines are good for scenarios where strong isolation or compatibility with different operating systems is needed.
Concept of Docker file, Image and Container :
Dockerfile :
It is a text document which contains all the commands that a user can call on the command line to assemble an image.
Docker Image :
Template to create docker container.
Docker Container :
Running instance of docker image. Containers hold entire package to run application.
Click here, for installing Docker in your system.
Let’s start with some commands which are used in Docker :
1. docker images
When you run this command in your terminal or command prompt, Docker will display a list of all the images currently stored on your system. The list will include information such as the image name, tag (version), image ID, creation date, and size.
2. docker -v
It is used to check the version of Docker installed on your system.
3. docker pull <image name>
It is used to download a Docker image from a container registry, such as Docker Hub or another registry where the image is hosted.
Ex. for downloading a ubuntu image from docker hub:
docker pull ubuntu
If you want to download specific version then use:
docker pull ubuntu:20.04
4. docker ps -a
The command docker ps -a
is used to list all containers on your system, including both running and stopped containers.
5. docker run - -name <name that you want to give the container> -it -d <image name or ID that you want to run>
Let’s take an example to understand this command :
docker run - -name pythonContainer -it -d python
The command docker run --name pythonContainer -it -d python
is used to create and run a Docker container based on the Python image. Here's the breakdown of the command:
docker run
: This is the command to create and run a Docker container.--name pythonContainer
: This flag sets a name for the container. In this case, the name is set to "pythonContainer."-it
: These flags are used together to allocate a pseudo-TTY (terminal) and make the container interactive.-d
: This flag runs the container in detached mode, meaning it runs in the background.python
: This specifies the name of the Docker image to use for creating the container. In this case, it's the Python image.
But here the question arises, What is interactive mode ?
Making a Docker container interactive means enabling a direct interaction between the user and the container’s command-line interface. This is often done by allocating a pseudo-TTY (terminal) and connecting it to the container’s standard input and output. An interactive container behaves similarly to a command prompt or terminal window, allowing you to input commands and receive immediate feedback from the container’s processes.
When you run a Docker container interactively using the -it
flags, you're essentially connecting your terminal's stdin, stdout, and stderr to the container's corresponding streams. This allows you to input commands via your keyboard (stdin), receive real-time output in your terminal (stdout), and see error messages (stderr) directly from the container's processes. This interactive communication makes it easier to work with and monitor the container's behavior.
Remember that not all containers need to be run in interactive mode, especially if they’re meant to run as background services or as part of an automated process. However, for tasks that require direct input or real-time interaction, using the -it
flags when running a container can greatly enhance your ability to work with it.
6. docker exec -it <container ID or container name> <command>
It is used to execute a command inside a running Docker container.
7. docker inspect <container ID or container name>
It is used to retrieve detailed information about a Docker container, image, volume, or network.
8. docker stop <container name or container ID>
This command is used to stop the running Docker container.
9. docker rm <container id or container name>
It is used to remove the existing container.
10. docker rmi <image name>
It is used to remove the docker image.
11. docker restart <container ID>
It is used to restart the docker container. Before restarting a Docker container, you generally need to stop it first. When you stop a container, it halts its execution, releases the resources it was using, and makes way for potential changes or updates. Once the container is stopped, you can then restart it using the appropriate docker restart <container name or container ID>
command.
For downloading a specific image and you are not aware how to install it use Docker Hub. Click here to see the details.
These are the basic commands required to be known in order to perform data science task. Now I am explaining you about brief idea about how to create your own docker image and run as a container using Dockerfile.
Detail Explanation about How to create Docker File with example.
1. Basic Dockerfile :
# Use an official Python runtime as the base image
FROM python:3.9
# Set the working directory inside the container
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Install any needed packages specified in requirements.txt
RUN pip install -r requirements.txt
# Make port 80 available to the world outside this container
EXPOSE 80
# Define environment variable
ENV NAME World
# Run app.py when the container launches
CMD ["python", "app.py"]
Let’s understand each lines of above dockerfile code in detail:
Base Image : (FROM python: 3.9)
A base image in Docker is the starting point for creating a new Docker image. It serves as the foundation or template upon which you build your customized image. The base image contains an operating system with certain libraries, tools, and configurations, which you can then extend by adding your application code, dependencies, and configurations.
When you create a Dockerfile, you usually start by specifying a base image using the FROM
instruction. This tells Docker which existing image you want to use as the foundation for your new image.
When you specify FROM python:3.9
in a Dockerfile, you are indeed using a base image that includes not only the Python interpreter but also a minimal Linux-based operating system.
The key thing to understand is that Docker images are layered. The base image you choose already includes a lightweight Linux distribution as its base operating system layer. On top of that, it adds the specific tools, libraries, and configurations needed for the software you want to run, such as Python in this case.
So, when you use FROM python:3.9
, you're not just getting Python installed; you're getting a Python environment built upon a Linux distribution.
WORKDIR / app :
The line WORKDIR /app
in a Dockerfile is used to set the working directory for subsequent commands that are executed during the image build process and when the container runs.
Here’s what it does:
WORKDIR
: This is a Dockerfile instruction that sets the working directory for any following commands in the Dockerfile./app
: This is the path to the directory within the container where the working directory will be set. In this case, it's set to the root directory/
and then further into theapp
subdirectory.
COPY . /app :
The line COPY . /app
in a Dockerfile is used to copy files and directories from your local machine (the host) into the Docker image being built.
Here’s what it does:
COPY
: This is a Dockerfile instruction used to copy files and directories from the host machine to the image being built..
: The dot represents the current directory on your local machine. It specifies that you want to copy all files and directories from your current working directory into the image./app
: This is the destination path inside the image where the files will be copied. In this case, it's the/app
directory within the image.
RUN pip install -r requirements.txt :
The line RUN pip install -r requirements.txt
in a Dockerfile is used to install Python dependencies listed in a requirements.txt
file into the Docker image being built.
Here’s what it does:
RUN
: This is a Dockerfile instruction that is used to execute commands during the image build process.pip install -r requirements.txt
: This command runs within the image and uses the Python package managerpip
to install the Python packages specified in therequirements.txt
file.
EXPOSE 80 :
The line EXPOSE 80
in a Dockerfile is used to indicate that a Docker container created from the image should listen on a specific network port, in this case, port 80.
Here’s what it does:
EXPOSE
: This is a Dockerfile instruction that documents the ports a container will listen on. It doesn't actually publish the port or make it accessible from the host system; it's more of a metadata declaration.80
: This is the port number you are declaring for exposure.
For instance, if you have a web server running inside the container and it’s set up to listen on port 80, adding EXPOSE 80
in the Dockerfile indicates that this container is expected to handle incoming network traffic on port 80. It helps anyone reading the Dockerfile understand which ports should be accessed when running the container.
To actually make the port accessible from the host system or other containers, you need to use the -p
flag when using the docker run
command. For example :
docker run -p 8080:80 image-name
- In this command, you’re mapping port 8080 on your local machine (the host) to port 80 inside the container.
- Think of port mapping as a tunnel between your host and the container.
- When you type
http://localhost:8080
into your web browser, your browser sends an HTTP request to port 8080 on your local machine (the host). - Because you mapped port 8080 on your host to port 80 inside the container, this request gets forwarded through the tunnel to the container’s port 80.
- Inside the container, the web server receives the incoming request on port 80, just as if the request had been sent directly to it.
- The web server processes the request, retrieves the requested web page, and sends an HTTP response back.
- The response from the web server inside the container travels back through the same tunnel to port 8080 on your host.
- Your web browser receives the response and displays the web page.
ENV NAME World :
The line ENV NAME World
in a Dockerfile sets an environment variable within the Docker image you are building.
Here’s what it does:
ENV
: This is a Dockerfile instruction used to set environment variables inside the image.NAME
: This is the name of the environment variable you're defining. It's the variable's identifier.World
: This is the value you're assigning to the environment variable. In this case, the value assigned is "World".
In this example, when you run a container from the image, the environment variable NAME
will be available to any processes within the container. The value of World
is what will be assigned to the NAME
environment variable.
You can access this environment variable within the container using commands, scripts, or your application code, depending on your use case.
CMD [“python”, “app.py”] :
The line CMD ["python", "app.py"]
in a Dockerfile specifies the default command to run when a container is started from the Docker image.
Here’s what it does:
CMD
: This is a Dockerfile instruction used to specify the default command and/or arguments to be executed when the container starts.["python", "app.py"]
: This is an array that represents the command and its arguments to be executed.
In this case, the command python app.py
is specified. This means that when you run a container from the image, the container will start by running the app.py
Python script using the python
interpreter.
Once you have created a Dockerfile, you can build a Docker image using the docker build
command. Use the following command to build the Docker image:
docker build -t image-name .
Replace image-name
with the desired name for your Docker image. The -t
flag is used to tag the image with a name.
Practical Implementation of Docker :
I gathered data by web scraping a mobile selling website, conducted thorough preprocessing and cleaning, and applied diverse machine learning algorithms. After identifying the optimal model, I built a Flask-based website that allows users to input their preferences for mobile devices, enabling the model to predict mobile prices accurately. To ensure seamless deployment on various cloud servers such as AWS, Azure, or Google Cloud, I containerized the machine learning model using Docker, eliminating any potential issues during the deployment process. The docker file which I created is:
FROM python : 3.8.17-slim-buster
WORKDIR /app
COPY . /app
RUN apt update -y && apt install awscli -y
RUN pip install -r requirements.txt
CMD["python", "app.py"]
I’ve integrated both the website code and the machine learning model into the app.py
file. This allows the website to seamlessly interact with the ML model and provide predictions based on user input.
Then, I build the docker image of my website using :
docker build -t mobile_price_predictor .
I give my docker image name mobile_price predictor and it takes some time for build and once it is build I use docker images, to see my image is created or not and I got :
Now to run this docker image:
docker run -p 5000:5000 mobile_price_predictor
any finally my website runs and it looks like:
Now I am share my docker image to anyone without worry about whether my code is running or not in others system due to library mismatch. Say goodbye to the stress of wondering if your code will behave on different computers because of library clashes. Thanks to Docker!
Multi-stage Docker build : The efficient way to generate docker images
A Docker multi-stage build is a feature that allows you to create more efficient and smaller Docker images by using multiple “stages” in your Dockerfile. Each stage represents a separate phase of the image building process, and you can copy specific files or artifacts from one stage to another. This approach is especially useful when you need to build software applications or services within a Docker container.
In a multi-stage Docker build, the builder stage is the initial part of the build process where you have more flexibility to include additional tools, dependencies, and build artifacts. This stage is where you can perform actions like compiling code, installing build-specific libraries, and generating intermediary files.
The final stage is the latter part of the build process, and it’s intended to produce a lightweight and efficient runtime environment for your application. This stage includes only the components that are necessary for your application to run properly, without any extraneous tools or build artifacts that are needed during the development and compilation phases.
The key benefits of using Docker multi-stage builds are:
- Reduced Image Size: Multi-stage builds help reduce the size of the final Docker image by allowing you to include only the necessary runtime artifacts, excluding unnecessary build tools and intermediate files.
- Optimized Dockerfile: You can keep your Dockerfile clean and focused, with separate stages for building and running the application. This makes the Dockerfile easier to understand and maintain.
- Faster Builds: Intermediate images are cached, so if the source files for a stage haven’t changed, Docker can use the cached image, speeding up the build process.
Let’s take a look at what a multi-stage Docker build appears like.
# Stage 1: Build the Application
FROM python:3.9 AS builder
# Set the working directory inside the container
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Install any needed packages specified in requirements.txt
RUN pip install --user -r requirements.txt
# Stage 2: Create the Final Image
FROM python:3.9
# Set the working directory inside the container
WORKDIR /app
# Copy the application and installed packages from the builder stage
COPY --from=builder /root/.local /root/.local
COPY --from=builder /app /app
# Make port 80 available to the world outside this container
EXPOSE 80
# Define environment variable
ENV NAME World
# Add the local bin directory to the PATH
ENV PATH=/root/.local/bin:$PATH
# Run app.py when the container launches
CMD ["python", "app.py"]
In a multi-stage Dockerfile, each “stage” is a separate phase of the image-building process. The result of one stage can be used as a base for the next stage, allowing you to create a final image that includes only the necessary components and files.
Stage 1 (Builder Stage):
- This stage builds the application and installs its dependencies.
- It uses the
python:3.9
base image. - It sets the working directory to
/app
. - It copies your application code and
requirements.txt
file into the image. - It installs the required packages using
pip install --user -r requirements.txt
.
Stage 2 (Final Image):
- This stage creates the final image that will be used to run the application.
- It also uses the
python:3.9
base image. - It sets the working directory to
/app
, just like in Stage 1. - Now, instead of installing packages using
pip install
, this stage copies the installed packages from the Builder Stage to the current image using theCOPY --from=builder
instruction. This is important because the builder stage is a separate image, and its contents need to be transferred to the final image. - The
COPY --from=builder /root/.local /root/.local
line copies the user's.local
directory from the builder image to the final image. This directory contains the Python packages installed using--user
in the builder stage. - The
COPY --from=builder /app /app
line copies the application code from the builder image to the final image. - The
EXPOSE 80
instruction exposes port 80 for potential communication. - The
ENV NAME World
instruction sets an environment variable namedNAME
with the valueWorld
. - The
ENV PATH=/root/.local/bin:$PATH
instruction adds the.local/bin
directory to thePATH
environment variable, ensuring that installed packages are executable. - Finally, the
CMD ["python", "app.py"]
command specifies that theapp.py
script should be run when the container starts.
I hope you are learning new things about docker with data science. Thank you for reading this article!