Docker and Its Internals

Shivanshu Goyal
The Startup
Published in
8 min readJul 19, 2020

Before we start talking about docker, we need to understand the problem which is solved by Docker efficiently and economically. Before Docker gained popularity, Companies used to use virtualization for running multiple applications as the different applications might need different sets of libraries and OS to run.

Why do organizations prefer virtual machines instead of provisioning actual servers to host their applications?

  1. Better response time. Virtualization can improve application performance, and the provisioning of virtual machines takes only minutes, rather than weeks. At progressed levels of implementation, self-provisioning by users is even possible.
  2. Improved application availability. When physical servers have problems, need routine maintenance, or require upgrades, the result is costly downtime. With virtual servers, applications can be readily moved between hosts to keep everyone up and running.
  3. Virtualization allows for more efficient use of IT equipment and labor resources.
Virtual machines run on a host machine using the hypervisor. A hypervisor (or virtual machine monitor, VMM, virtualizer) is computer software, firmware or hardware that creates and runs virtual machines. A computer on which a hypervisor runs one or more virtual machines is called a host machine, and each virtual machine is called a guest machine

Virtualization also has some shortcomings. Suppose, we need to deploy one application that requires a different OS with a set of configurations. It requires us to provision a new guest OS in order to run the application. Also, Running multiple Virtual Machines in the same host operating system leads to performance degradation. This is because of the guest OS running on top of the host OS, which will have its own kernel and set of libraries and dependencies. This takes up a large chunk of system resources, i.e. hard disk, processor, and especially RAM. Boot up process is also a long process that takes a minute time to boot up which becomes critical in case of real-time applications. There is one more big issue with virtualization as it requires a fixed RAM size to allocate to a VM which leads to unused blocked memory of the host machine. This blocked memory cannot be allocated to a new VM. Here, Containerization comes to rescue us from all these shortcomings.

What is containerization?

Containerization is a technique to bring virtualization to the OS level. Containerization is much more efficient as it does not require guest OS. It uses the same host kernel to run the application. Now, Docker is a containerization platform that packages your application with all its required dependencies in a container to run seamlessly in any environment.

Containers run a host machine using the container engine.

The best part of Containerization is that it is very lightweight as compared to the heavy virtualization. It takes a fraction of seconds to start a new container on a machine. Resources are shared among all the containers to allow the maximum utilization.

Docker terminologies

  1. Docker image: A docker image is a blueprint or read-only template with instructions to create a docker container. We can create our own docker image (on top of other existing images). These images can be saved in a repository (local or global). This repository is known as the docker registry.
  2. Docker container: A docker container is a running instance of a docker image. It has a writable layer on top of one or more read-only layers. We make changes or execute commands in this writable layer of the container.
  3. Docker registry: It is a repository (like maven repository) that stores docker images uploaded by developers to be leveraged by other developers. Container repositories are the specific physical locations where your Docker images are actually stored, whereby each repository comprises a collection of related images with the same name. Each image inside a repository is uniquely identified by a tag. nodejscn/node is one of the repositories on the docker hub with multiple tags.
  4. Docker networking: Docker takes care of the networking aspects so that the containers can communicate with other containers and also with the Docker Host. We can see the docker ethernet adaptor using the command ifconfig on the docker host. We can also access applications running in a docker container on the port exposed to the external world.
  5. Docker storage: Docker has multiple storage drivers that allow one to work with the underlying storage devices. AUFS, DeviceMapper, Ovelay, ZFS, etc. Docker has data volumes that can be shared across containers. Some of the features of the data volumes:
  • They are initialized when the container is created.
  • They can be shared and also reused amongst many containers.
  • Any changes to the volume itself can be made directly.
  • They exist even after the container is deleted.

6. DockerFile: A dockerFile is a simple text file that contains a set of statements to build our own docker image.

# Each step creates a read-only layer of the image, There will be 6 layers in the image.# For Java 8
FROM openjdk:8-jdk-alpine

# Refer to Maven build -> finalName
ARG JAR_FILE=target/spring-boot-web.jar

# cd /opt/app
WORKDIR /opt/app

# cp target/spring-boot-web.jar /opt/app/app.jar
COPY ${JAR_FILE} app.jar
# exposing the port on which application runs
EXPOSE 8080
# java -jar /opt/app/app.jar
ENTRYPOINT ["java","-jar","app.jar"]

This dockerFile is used to create a docker image for our spring-boot application. Here, OpenJDK is the base image used, on top of which our docker image will be created. Commands to build the docker image:

# Command to build docker image
docker build -t {A_TAG_TO_YOUR_IMAGE} .
# Try this, if dockerFile is not present in the root folder of your spring-boot project
docker build -t {TAG} -f {PATH_TO_DOCKER_FILE} .
# Command to run your application container
docker run -d -p 8080:8080 -t {SAME_TAG_USED_ABOVE}

You might be wondering how does docker help us here? To run a spring-boot application, we would need java to be installed on the machine. But, with spring-boot docker image, None of the dependencies are required to be installed on the host machine. Every single dependency required to run the application embedded in the container. Container can be run on any machine where docker engine is installed.

7. .dockerignore is the file that is used to ignore some of the unnecessary files from the image. It plays an important role in creating more compact, faster-running containers.

Docker Architecture

Docker architecture is a client-server architecture where Docker client talks to the Docker server(host) using REST APIs. It has 2 main components:

  1. The client which is a CLI (command-line interface) where we run commands (like build, run, pull, ps, images, container, etc) to interact with docker host.
Docker Architecture
Docker container- running instance of the spring-boot image. Application logs and other things will get added to the writable layer of the container.

The major difference between a container and an image is the top writable layer. All writes to the container that adds new or modifies existing data are stored in this writable layer. The writable layer is also deleted with container. The underlying image remains unchanged. Because each container has its own writable container layer, and all changes are stored in this container layer, multiple containers can share access to the same underlying image and yet have their own data state.

Docker commands

Docker commands can be found using docker --help command. To know more information about a command, use the commands like:

docker container --help

docker image --help

docker build -t {A_TAG_TO_YOUR_IMAGE} . to build an image using dockerFile.

docker login to login to hub.docker.com

docker prune to remove all the stopped containers.

docker run -d -p 8080:8080 -t {TAG} to run a container of an image.

docker management commands
docker commands

Misconception about the base OS image in a container that the container launches a full-fledged guest OS on top of host OS for the given OS base image in a docker image.

Most of the docker files use an OS base image FROM centos:latest.

Do containers also start guest OS on top of host OS as we do in case of virtualization?
It becomes extremely important to understand about base images when we start understanding docker from stratch. This base image of an OS is not full fledged operating system. It is just an OS user space minus the kernel. It uses the kernel space of host machine. Base image is much more lighter than base OS and that’s why docker containers can be really fast. It only installs distro specific (or “userland” ) software.

Why do we need a base OS image in our image?
The docker containers filesystem is isolated from the host OS. So an application packaged inside a docker image wont be able to see the host filesystem(unless attached as volume) at the time of running as a container. So, imagine the application you have inside the container depends on various OS libraries, so maintaining the isolation if we want to run the application we will have to package those dependencies too inside the Docker Image. It is also possible use host OS directly. FROM scratch

Can we use any base image in a container to be run on a host OS?
To know the answer, we first need to know about linux distros. All linux distros use the same “Linux” kernel, however all distros make slight changes to it in order make the kernel work best for them.

Suppose, Host OS is ubuntu (one of the Linux distros), then your container could use a base image of OS which is based on Linux only (that’s why docker cannot run FreeBSD or Windows inside Linux). Installing CentOS container inside Ubuntu OS will mean that you will get the userland from CentOS, while still running the same kernel of host OS (from Ubuntu).

Important key points to remember while building an image using a dockerFile

  1. Most frequent modifying layers should be present towards the end of the dockerFile. Suppose, There are 4 layers(instructions) in a dockerFile, modification in the 2nd instruction builds all the layers from 2nd onwards from scratch.
  2. Layering is a critical thing to keep in mind as it contributes to caching to unmodified layers. Suppose, 2 images are built using the same base image. The 2nd image will use the cached base image from the 1st image. Therefore, the base image does not take double disk space.Image1 — 250 MB, Image2 — 320 MB, Base Image — 150 MB, total disk space used is 320 MB for these 2 images (not 320+250 MB)
  3. Use multi-stage dockerFile to build an image, it helps in keeping the image size down.
  4. Build images using the BuildKit to see an improvement on performance, storage management, feature functionality, and security.

Thanks for reading!

--

--

Shivanshu Goyal
The Startup

Software Engineer @Salesforce, USA | Ex-Walmart | Ex-Motorola | Ex-Comviva | Ex-Samsung | IIT Dhanbad