Dockerfile and Best practices for writing Dockerfile: Diving into Docker — Part 5

8 min readMay 5, 2020

Docker can build images automatically by reading the instructions from a Dockerfile. A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image. Using docker build users can create an automated build that executes several command-line instructions in succession.

A Docker image consists of read-only layers each of which represents a Dockerfile instruction. The layers are stacked and each one is a delta of the changes from the previous layer.

This is what I was talking about above, when a Docker container starts up, it needs to be told what to do, it has nothing installed, it knows how to do nothing. Truly nothing.

The first thing a Dockerfile needs is a base image. A base image tells the container what to install as its OS — Ubuntu, RHEL, SuSE, Node, Java, etc.

Next, you’ll provide setup instructions. These are all the things the Docker container needs to know about: environment variables, dependencies to install, where files live, etc.

And finally, you have to tell the container what to do. Typically it will be running specific installations and commands to the application specified in the setup instructions. I’ll give a quick overview of the most common Dockerfile commands next and then show some examples to help it make sense.

Create a Dockerfile

Creating a Dockerfile is as easy as creating a new file named “Dockerfile” with your text editor of choice and defining some instructions. The name of the file
does not really matter. Dockerfile is the default name but you can use any filename that you want (and even have multiple dockerfiles in the same folder)

Simple Dockerfile for NGINX

## Each instruction in this file generates a new layer that gets pushed to your local image cache### Lines preceeded by # are regarded as comments and ignored### The line below states we will base our new image on the Latest Official UbuntuFROM ubuntu:latest# Identify the maintainer of an imageLABEL maintainer="adari.sgirishkumar@gmail.com"# Update the image to the latest packagesRUN apt-get update && apt-get upgrade -y# Install NGINX to test.RUN apt-get install nginx -y# Expose port 80EXPOSE 80# Last is the actual command to start up NGINX within our ContainerCMD ["nginx", "-g", "daemon off;"]

Dockerfile Commands

ADD — Defines files to copy from the Host file system onto the Container

ADD ./local/config.file /etc/service/config.file

CMD — This is the command that will run when the Container starts

CMD [“nginx”, “-g”, “daemon off;”]

ENTRYPOINT — Sets the default application used every time a Container is created from the Image. If used in conjunction with CMD, you can remove the application and just define the arguments there

CMD Hello World!
ENTRYPOINT echo

ENV — Set/modify the environment variables within Containers created from the Image.

ENV VERSION 1.0

EXPOSE — Define which Container ports to expose

EXPOSE 80

FROM — Select the base image to build the new image on top of

FROM ubuntu:latest

LABEL maintainer — Optional field to let you identify yourself as the maintainer of this image. This is just a label (it used to be a dedicated Docker directive).

LABEL maintainer=”someone@xyz.xyz”

RUN — Specify commands to make changes to your Image and subsequently the Containers started from this Image. This includes updating packages, installing software, adding users, creating an initial database, setting up certificates, etc. These are the commands you would run at the command line to install and configure your application. This is one of the most important dockerfile directives.

RUN apt-get update && apt-get upgrade -y && apt-get install -y nginx && rm -rf/var/lib/apt/lists/*

USER — Define the default User all commands will be run as within any Container created from your Image. It can be either a UID or username

USER docker

VOLUME — Creates a mount point within the Container linking it back to file systems accessible by the Docker Host. New Volumes get populated with the pre-existing contents of the specified location in the image. It is specially relevant to mention is that defining Volumes in a Dockerfile can lead to issues. Volumes should be managed with docker-compose or “docker run” commands. Volumes are optional. If your application does not have any state (and most web applications work like this) then you don’t need to use volumes.

VOLUME /var/log

WORKDIR — Define the default working directory for the command defined in the “ENTRYPOINT” or “CMD” instructions

WORKDIR /home

Best practices for writing Dockerfile

1) FROM: should have a tag and it shouldn’t be latest

FROM ubuntu:latest 
FROM ubuntu:18.04       // this is a better practise

‍The FROM command in Docker enables you to set the base image for a build stage. Although it’s possible to specify an image without a tag, we suggest not doing this, as you can introduce breaking changes and produce other unexpected results. Specifying “latest” as the image tag is similarly useless.Instead, find the version of the image you want to use and specify that exact version in your Dockerfile.

2) RUN: apt / yum: installed packages should have a version

‍`RUN apt-get` is the primary method of package installation in most (Linux-based) Dockerfiles. As with any other package management system, it’s critical to specify the version of each package you install to promote stability. And no, using “latest” doesn’t count — it’s subject to the same issues as failing to specify a version at all.

You should also combine all of your `apt-get` statements into a single line, separated by newlines.

Instead of this:‍

#Debian/Ubuntu: RUN apt-get install -y dirmngr wget gnupg

#RHEL/CentOS: RUN yum install -y dirmngr wget gnupg

Do this:

#Debian/Ubuntu: RUN apt-get install -y dirmngr=19.04 gnupg=2.2.17 wget=1.19

#RHEL/CentOS RUN yum install -y dirmngr-19.04 gnupg-2.2.17 wget-1.19

Doing so prevents you from creating unnecessary additional layers and speeds up the build process. The `apt` command also takes time to initialize, so the fewer times you call it, the better.

3) FROM: Every image should be pulled from the organization’s private registry

Another argument you can pass to the FROM command is the registry you want the image to be pulled from. By default, this is a public registry, usually Dockerhub. For security purposes, however, you may wish to enforce that every image is pulled from your registry. We highly recommend large organizations make use of a private registry for security purposes.

4) USER property should be specified, and it should not be root

‍Any Dev Ops engineer worth his salt knows the security vulnerabilities created by running services as the root user. Compartmentalized containers mitigate these risks somewhat, but they are still severe enough that running services as the root user are unsafe.

By default, the default user in most base images (the image that you build the container on, specified by FROM) is root, which means that the user will remain root in that container until otherwise specified. The USER command allows us to manually set the user’s ID within the Dockerfile at any point. We strongly suggest you enforce the use of this command in every Dockerfile as early as possible.

5) HEALTHCHECK property should exist

‍With the rise of containers and independent self-deployed and maintained microservices, it’s more important than ever to have visibility into the state of every part of your infrastructure.

Fortunately, Docker provides a small but fully functional API for exposing the status of the container to allow you to inspect whether or not it’s ready to do work. This provides more useful information than monitoring whether or not the process is simply running does, since “running” covers a range of states from “it’s working”, to “still launching”, to even “stuck in a broken state”.

You can access this API via the HEALTHCHECK instruction. A Dockerfile can only contain one HEALTHCHECK instruction; if more than one is specified, only the last one is used.

6) LABEL property should exist

‍The LABEL property is a feature of Docker allows you to specify custom metadata in the form of key/value pairs. In brief, after creating a label, you can reference that label’s value wherever you need to using its key.

Labels allow you to produce readable and easily maintainable Dockerfiles. They make explicit the purpose of a value wherever its corresponding label is used. They also make it easier to update a value across a Dockerfile, since you only need to change it once — in the place of its declaration.

Another useful property of labels is that `docker inspect` can extract them from Docker Images at build time. Images become much simpler to understand and organize when you extract their organizational metadata into labels.

7) Specify the container’s maintainer in a LABEL

‍One piece of metadata is important enough to warrant being discussed on its own — the “maintainer” property.

In the early days of Docker, this property used to be an actual instruction called `MAINTAINER`. After labels were introduced back in 2015, MAINTAINER was deprecated in favor of just creating a label with a key of maintainer, like this:

LABEL maintainer=”adari.girishkumar@gmail.com”‍

You should include maintainer metadata in every Dockerfile to indicate who is responsible for the container. Code ownership is critical, especially as organizations scale in size. As more people contribute to an un-owned piece of infrastructure, critical knowledge about how that code works (such as pitfalls, bugs, and other context) is further distributed among them until it becomes so diluted that it can be lost entirely. To avoid situations like this, it’s best to assign an owner to each piece of software within your organization. Usually, this will be the person who originally developed it, or somebody to whom they pass down the relevant context and knowledge.

8) Keep your image lightweight with a .dockerignore file‍

If your Dockerfile contains ADD or COPY instructions, make sure you keep the size of your image to a minimum by including a .dockerignore file. This will speed up your build and prevent you from doing unnecessary work. This file will give you fine-grained control over what is copied or added to the container by allowing you to specify glob patterns matching files and directories you want to ignore.

For example, let’s say we usually COPY the entire contents of the root directory into the container. There are a number of directories which we don’t need at all that currently we’d just be copying and storing for no reason — for example, we don’t need the `.git/` directory in a production environment. Similarly, there’s usually nothing to be gained by including your `tests/` directory, so you don’t want to copy that over either.

In smaller projects, we aren’t saving that much work by excluding directories like these, but in larger ones, the differences can be in the hundreds of megabytes.

9) FROM: Stop using Python 2.7

The Python team have been phasing out 2.7 for a while, and support is officially ceased. The occasional tool or process still lives in the dark ages, but the vast majority have been upgraded to 3 and would break if another version is in use. Between the upcoming lack of support and the fact that so many people are using Python 3, one of our recommended best practices is to enforce the use of 3 in your Dockerfiles.

**link to the next part will be updated soon