A High Level Overview of Docker

What is Docker?

9 min readFeb 28, 2020

Docker is an open platform for developing, shipping, and running applications. It separate your applications from your infrastructure through packaging and running (potentially multiple) applications in loosely isolated environments called a containers so you can deliver software quickly.

Containers are lightweight because they don’t need the extra load of a hypervisor, but run directly within the host machine’s kernel. You can even run containers within VMs. Containers becomes the unit for distributing and testing your application. A container runs natively on Linux and shares the kernel of the host machine with other containers. It runs a discrete process, taking no more memory than any other executable, making it lightweight. By contrast, a VM runs a full-blown “guest” operating system with virtual access to host resources through a hypervisor.

Docker Engine

This is a client-server application with these major components:

A server which is a type of long-running program called a daemon process (the dockerd command). The daemon creates and manages Docker objects, such as images, containers, networks, and volumes.
A REST API which specifies interfaces that programs can use to talk to the daemon and instruct it what to do.
A command line interface (CLI) client (the docker command). The CLI uses the Docker REST API to control or interact with the Docker daemon through scripting or direct CLI commands.

Benefits of Docker

Fast and consistent — Standardized environments for CI/CD workflows
Responsive deployment and scaling — Portability and lightweight nature also make it easy to dynamically manage workloads, and can enable running of applications anywhere
Running more workloads on the same hardware — Lightweight and fast

Architecture

Docker uses a client-server architecture. The Docker client talks to a Docker daemon (or multiple daemons) via REST API/UNIX sockets/network interface, which manages Docker objects such as images, containers, networks, and volumes. The daemon can be local or remote, and can also communicate with other daemons to manage Docker services.

A Docker registry stores Docker images. Docker Hub is a public registry that anyone can use, and Docker is configured to look for images on Docker Hub by default. You can also run your own private registry. When you use the docker pull, docker runordocker push commands, the required images are pulled/pushed from/to your configured registry.

A Docker Image is a read-only template with instructions for creating a Docker container. It includes everything needed to run an application — the code or binary, runtimes, dependencies, and any other filesystem objects required. Often, an image is based on another image, with some additional customization. To build your own image, you create a Dockerfile with a simple syntax for defining the steps needed to create the image and run it. Each instruction in a Dockerfile creates a layer in the image. When you change the Dockerfile and rebuild the image, only those layers which have changed are rebuilt.

A Docker Container is a runnable instance of an image. It is nothing but a running process, with some added encapsulation features applied to it in order to keep it isolated from the host and from other containers. You can create, start, stop, move, or delete a container using the Docker API or CLI. You can connect a container to one or more networks, attach storage to it, or even create a new image based on its current state. A container is defined by its image as well as any configuration options you provide to it when you create or start it. One of the most important aspects of container isolation is that each container interacts with its own private filesystem; this filesystem is provided by a Docker image.

Services allow you to scale containers across multiple Docker daemons, which all work together as a swarm with multiple managers and workers. Each member of a swarm is a Docker daemon, and the daemons all communicate using the Docker API. A service allows you to define the desired state, such as the number of replicas of the service that must be available at any given time. By default, the service is load-balanced across all worker nodes.

Development Workflow

In general, the development workflow looks like this:

Create and test individual containers for each component of your application by first creating Docker images.
Assemble your containers and supporting infrastructure into a complete application.
Test, share (for example sharing images on Docker Hub), and deploy your complete containerized application. Remember to keep your Dockerfile and source code together in version control.

With your image available on Docker Hub, you’ll be able to run it anywhere.

Networking

You can connect Docker containers and services together, or connect them to non-Docker workloads in a platform-agnostic way. Docker’s networking subsystem is pluggable, using drivers.

User-defined bridge networks are best when you need multiple containers to communicate on the same Docker host.
Host networks are best when the network stack should not be isolated from the Docker host, but you want other aspects of the container to be isolated.
Overlay networks are best when you need containers running on different Docker hosts to communicate, or when multiple applications work together using swarm services.
Macvlan networks are best when you are migrating from a VM setup or need your containers to look like physical hosts on your network, each with a unique MAC address (able to assign MAC addresses to containers).
Third-party network plugins allow you to integrate Docker with specialized network stacks.

Manage Data in Docker

All files created inside a container are stored on a writable container layer. This means the data doesn’t persist when the container no longer exists, and it can be difficult to get the data out if another process needs it. Moreover, the writable layer is tightly coupled to the host machine and cannot be easily moved. This storage arrangement also requires a storage driver to manage the filesystem. This storage driver provides a union filesystem using the Linux kernel, which reduces performance (vs. using data volumes which write directly to the host filesystem).

Docker provides 2 options for containers to store files in the host machine to ensure file persistence — volumes and bind mounts (tmpfs mount in Linux and named pipe in Windows). Data is exposed as either a directory or individual file in the container’s filesystem.

Volumes (managed by Docker) are stored in a part of the host filesystem which is managed by Docker (/var/lib/docker/volumes/ on Linux). Non-Docker processes should not modify this part of the filesystem (isolated from core functionality of host machine). Volumes are the best way to persist data in Docker. You can create a volume explicitly using the docker volume create command, or Docker can create a volume during container or service creation. You have to explicitly remove unused volumes using docker volume prune. A given volume can be mounted into multiple containers simultaneously. Mounting volumes into containers mounts the directory into the container. When you mount a volume, it may be named or anonymous. Volumes also support the use of volume drivers, which allow you to store your data on remote hosts or cloud providers, among other possibilities. Volumes are used to share data among multiple running containers and are ideal if you store your container’s data on a remote host/cloud. Volumes also help you decouple the configuration of the Docker host from the container runtime, and if you need to back up, restore, or migrate data from one Docker host to another, volumes are a better choice.

Bind mounts may be stored anywhere on the host system. They may even be important system files or directories. Non-Docker processes on the Docker host or a Docker container can modify them at any time. Using a bind mount mounts a file/directory on the host machine into a container (referenced by its full path on the host machine, and does not need to exist on Docker host already). You can’t use Docker CLI commands to directly manage bind mounts, and you can change the host filesystem via processes running in a container, including creating, modifying, or deleting important system files or directories (security concerns). Bind mounts can be useful for sharing configuration files from the host machine to containers, sharing source code or build artifacts between a development environment on the Docker host and a container, or when the file or directory structure of the Docker host is guaranteed to be consistent with the bind mounts the containers require.

tmpfs mounts are stored in the host system’s memory only (not persisted on disk), and are never written to the host system’s filesystem. Generally used to store non-persistent state or sensitive information.

Named pipes can be used for communication between the Docker host and a container. Common use case is to run a third-party tool inside of a container and connect to the Docker Engine API using a named pipe.

Orchestration

Containers effectively guarantee that applications run the same way anywhere regardless of environments. They also enable easy scalability using tools to automate the maintenance of our applications. Tools to manage, scale, and maintain containerized applications are called orchestrators, and the most common examples of these are Kubernetes and Docker Swarm. Development environment deployments of both of these orchestrators are provided by Docker Desktop.

Kubernetes provides many tools for scaling, networking, securing and maintaining your containerized applications, above and beyond the abilities of containers themselves. All containers in Kubernetes are scheduled as pods, which are groups of co-located containers that share some resources. Most of the workloads are scheduled as deployments, which are scalable groups of pods maintained automatically by Kubernetes. All Kubernetes objects can and should be described in manifests called Kubernetes YAML files. These YAML files describe all the components and configurations of your Kubernetes app, and can be used to easily create and destroy your app in any Kubernetes environment.

A Kubernetes YAML file almost always follows the same pattern:

The apiVersion, which indicates the Kubernetes API that parses this object
The kind indicating what sort of object this is
Some metadata applying things like names to your objects
The spec specifying all the parameters and configurations of your object.

Swarm provides many tools for scaling, networking, securing and maintaining your containerized applications, above and beyond the abilities of containers themselves. All Swarm workloads are scheduled as services, which are scalable groups of containers with added networking features maintained automatically by Swarm. All Swarm objects can and should be described in manifests called stack files. These YAML files describe all the components and configurations of your Swarm app, and can be used to easily create and destroy your app in any Swarm environment.

In this Swarm YAML file, we have just one object: a service, describing a scalable group of identical containers. In this case, you’ll get just one container (the default), and that container will be based off abulletinboard:1.0 image. In addition, We’ve asked Swarm to forward all traffic arriving at port 8000 on our development machine to port 8080 inside the bulletin board container.

Note: In Swarm, a service provides both scheduling and networking facilities, creating containers and providing tools for routing traffic to them. In Kubernetes, scheduling and networking are handled separately: deployments (or other controllers) handle the scheduling of containers as pods, while services are responsible only for adding networking features to those pods.

Conclusion

So, this was a high level overview of some basic Docker concepts gotten from their very comprehensive documentations page. Hope this can be a useful introduction for your future exploration of the wonderful world of containerization!