The Whale in the Grocery Store — An Introduction to Dockerfiles and Docker Images

Michael Mucciarone
Capital One Tech
Published in
13 min readMay 20, 2019
A shopper pushing a cart and holding a bar-code scanner with the docker logo on it.

In my previous article, The Whale in the Refrigerator, I introduced Docker and walked through the basics of pulling an image from a Docker Registry and running a basic web server. Hopefully that tutorial led you to ask, “Where did that Docker image actually come from?” That’s what we’ll be talking about for this article, building and working with Docker images.

More Whale-Speak

In the previous article, I introduced two pieces of the Docker lexicon, Image (or Container Image) and Container. In order to build our own Docker images we must learn a few more terms.

  • Dockerfile: A text file that contains a set of instructions required to build a Docker image.
  • Docker Registry: A repository for storing and distributing Docker images.
  • Docker Hub: Docker Inc.’s official Docker image registry.

Those are fairly simple definitions, but their simplicity belies their power.

Image is Everything

The Docker Container Image is the basis of everything you can do with Docker, but what, really, is a Docker image? Well, a Docker image is whatever you want it to be… and that still isn’t a good answer.

In the previous article, I likened a computer running Docker to a refrigerator, noting the parallels between putting a food container in a refrigerator and running a container with Docker. Sticking with the food metaphors, let’s move from the kitchen to the grocery store.

An empty shopping cart at the junction of 2 aisles in a grocery store

At the grocery store you usually buy food off a shelf. All of the food and/or ingredients of similar types are grouped together (milk with milk, eggs with eggs, beer with beer, etc.) and you choose the option you want. While standing in front of a given shelf you might pull down a food container to see if it’s something you want, inspect it for damage, check the expiration date, check for spoilage, check the ingredients (food allergies anybody?), look at the nutritional info or even check for missing items inside the container (who wants to go home with 11 eggs instead of 12?). If the item meets your criteria you place it in your cart. When you find all the items you want, you purchase them, bring them home, and eventually assemble them into a meal.

All of these same ideas apply to Docker images. You can shop for the base ingredients of a Docker image using Docker Hub. Once you find some appropriate base images you pull them and inspect them. It is always wise to inspect a Docker image to see if it fits your use-case, check for potential security vulnerabilities, see what you will need to add to make your use-case work, and make sure the software contained in the image is up to date. Once you decide on a base image, you then build your Docker image based on these ingredients.

Build the Whale… Be the Whale

To build a new Docker image you need a few things. The first thing you need is Docker (duh!). The command docker image build… is used to build the actual image, but first you need a Dockerfile.

Dockerfile in Depth

A Dockerfile is quite simple. It uses a Domain Specific Language (DSL) to instruct Docker to do things inside a nascent Docker image such as copy files from the host to a location inside the image or run shell commands. All that said, it’s probably easier to show a Dockerfile than to describe one.

Basic Dockerfile

FROM nginx:latest
ENV build_date='2019–03–29'
RUN mkdir -p /usr/share/nginx/html/
COPY index.html /usr/share/nginx/html/
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

That’s a lot of code to process, so let’s walk through it line by line.

FROM nginx:latest

The FROM command is required for every Dockerfile and tells Docker what the base image of the new Docker image will be — it literally tells Docker where to start from. This can be any previously existing image that you as a user have access to, including images you have previously built. There is also a reserved virtual base image called scratch that contains almost nothing and is used for building base or minimal images.

If the image specified in the FROM … line is not present on the system that is building the Docker image, Docker will attempt to pull the image from Docker Hub or the indicated Docker Registry (more on registries later). If Docker fails to pull the indicated image, the build will fail. For this reason, I have found it handy to pre-pull my FROM … images when scripting Docker image builds.

ENV build_date=’2019–03–29’

The ENV command sets an Environment Variable within the context of the Dockerfile AND in the container when it is run. In this case, if I were to run a container based on this image and execute echo $build_date it would return 2019–03–29.

Security Note: The ENV command should NEVER be used to insert a password or other secret into a Docker image at build time because it is easily retrieved from the image. There are patterns and tools for retrieving secrets for Docker Containers at runtime, but those are beyond the scope of this article. If you would like to learn more about container security, here is an excellent article.

RUN mkdir -p /usr/share/nginx/html/

The RUN command may be one of the most useful commands available in a Dockerfile. This command executes shell commands inside the nascent image. This particular command creates a directory /usr/share/nginx/html/ but it can be used for much more than that. Any command that is available inside the Docker image can be executed here.

In order to keep the size of your final Docker image as small as possible, it is recommended that you chain shell commands together using && or ; and that you minimize the number of RUN lines in a Dockerfile. For more detail on why you want to do this, the official Docker documentation has an article about Dockerfile best practices. Here is an example run line showing the chained commands:

RUN mkdir -p /usr/share/nginx/html/ && \
adduser user && \
chown -R user /usr/share/nginx/html

The RUN command is extremely powerful and I have barely scratched the surface of what you can do with it. You can learn more about the RUN command from the official Dockerfile documentation.

COPY index.html /usr/share/nginx/html/

This line tells Docker to copy the file index.html from the host filesystem to /usr/share/nginx/html/ inside the image we are building. In this instance, we’re copying an html file (index.html) to the default website code location inside a default Nginx image. The COPY directive also supports wildcards and recursive copying of directories. To copy using a wildcard pattern use *.pattern notation like this: COPY *.html /usr/share/nginx/html/. To copy a directory and all of its contents, be sure that the directory name has a trailing slash (/) like this: COPY html/ /usr/share/nginx/html/.

EXPOSE 80

The EXPOSE instruction is an interesting case. According to the official Dockerfile reference, EXPOSE informs Docker that this image, when instantiated as a container, will listen on the port referenced, in this case 80. However, when the container is run, that port is not automatically published (i.e. opened to the outside world). You may recall from the first article that the actual publishing of ports is accomplished with the -p flag when executing docker run.

Instead, the EXPOSE directive serves as a kind of documentation between the builder of an image and the person who runs a container from that image. In other words, if I see EXPOSE 80 in the Dockerfile for an image, I know to include a -p 80:80 on the command line when I run that container.

EXPOSE defaults to TCP but can also specify UDP like this EXPOSE 10000/udp.

CMD [“nginx”, “-g”, “daemon off;”]

This final line of the Dockerfile is the command that should be run by default inside the container when the container is instantiated. In this example, we are using what is called exec form for the CMD command. In exec form we provide an array (["nginx", "-g", "daemon off;"]) of commands, command-line switches, and arguments to the CMD command. The advantage of this form is that it does not execute a shell around the command, it executes the command directly. In other words, it literally executes nginx -g daemon off; when run.

There is another form of this command known as shell form. The shell form equivalent of the command above looks like this: CMD nginx -g daemon off;. The difference here is that Docker actually executes sh nginx -g daemon off; when you run the container, hence the name shell form. The main reason to use shell form is if you need to evaluate environment variables as part of the CMD command.

But Wait, There’s More!

This is just a taste of what is possible with Dockerfiles and I encourage you to learn more. You can always consult the official Dockerfile documentation, but another great resource for learning more about Dockerfiles is looking at what other people have done with them. If you find an interesting Docker image on Docker Hub there is usually a link to the Dockerfile that was used to build that image, as well as the GitHub repository that it is part of. Also, in this instance, Google is indeed your friend.

Building an Image

So now we have a Dockerfile, but what do we do with it? To actually build a Docker image, we use the docker image build… command. In this case, the command would look something like this:

code for building a docker image

In this case, we are instructing Docker to build an image called image-name with version tag that can be pushed to some.registry.com in the repository repository. This is what the -t flag instructs, and the example command above demonstrates the proper naming scheme.

There are a few details here that we need to highlight. First off is some.registry.com. This part of the image name is optional. If you leave it off and you attempt to push the image, Docker will try to push the image to Docker Hub, which, as mentioned earlier, is Docker Inc.’s official image registry.

Next is the /repository/ portion of the name. If you are pushing to Docker Hub, this should be your Docker Hub username, also known as your DockerID unless you are purposely pushing the image to a shared repository — in which case this should be the name of the shared repository. If you are pushing to another registry, such as an internally hosted registry, you should follow that registry’s guidelines for Docker image organization.

The image-name: section is where you actually “name” your image. It is important to note that every Docker image has a unique Image ID that is generated when the image is built. You can, however, give a Docker image whatever name you want. I suggest that the name should be descriptive of what the Docker image does, but if you want to name your image Bob, go right ahead.

The :tag section is one of the more powerful features of Docker nomenclature. This part of the name is free-form text just like the image-name: section, but is usually used to keep track of some kind of version. A given Docker image can have numerous tags. For example, I could have an image titled my-repository/web-app:2019–02–06_0915, another version of that image titled my-repository/web-app:2019–02–15_1350, and a third version titled my-repository/web-app:2019–02–16_1500. Each of these Docker images would have different Image IDs because, as the tag implies, they are different builds.

Docker allows for a given Image ID to have multiple names — think of them as aliases. I have my official build name my-repository/web-app:2019–02–16_1500 but I could give that image the name my-repository/web-app:latest. These two names would have the exact same Image ID, and by definition would also contain the exact same code (in fact, the Image data is not duplicated on disk, both of those Image names point to the same bits on disk). The neat thing here is that Docker treats the latest tag as the default, so if I docker pull my-repository/web-app, it will pull my-repository/web-app:latest which is currently identical to my-repository/web-app:2019–02–16_1500.

A skyscraper that is under construction.

Now let’s say I rebuild my web-app again. The build system creates the image my-repository/web-app:2019–02–27_0950 and at the same time re-tags (aliases) that image to my-repository/web-app:latest and pushes both Image names to the registry. This would overwrite the Image my-repository/web-app:latest to point to the Image ID for my-repository/web-app:2019–02–27_0950 and consequently if I docker pull my-repository/web-app:latest I’m getting my-repository/web-app:2019–02–27_0950. You could also use other tags like :release or :qa that are also aliased to the most recent versioned build. This allows for very powerful release automation, including extremely easy release rollbacks as well as things like canary testing.

Another detail to notice in that command is the final . at the end. That final piece is the “build context” — the “from” if you like. This will be the root directory of the Docker image build procedure. It is very important to note that any file or directory under the build context could potentially end up inside your Docker image. It is for this reason that you should never put / as the build context, you could literally end up copying your entire hard drive into the Docker image! You may even wish to keep your Dockerfile in a directory above your source directory. For example, say you keep your Dockerfile in the root of your repository and your actual source code in ./src, you would put ./src as the context of the build and pass the command line switch -f ./Dockerfile (to explicitly tell Docker where to find your Dockerfile). If nothing else, this avoids accidentally copying your Dockerfile into the Docker image.

Register to Push

The final link in this chain is the Docker Registry. I touched on registries earlier, but it is important to note that Docker Registries are how you distribute Docker images.

Docker Hub

A tugboat pulling a large container ship that is encumbered with shipping containers.

Docker Inc. runs a registry called Docker Hub. Docker Hub is where you can find “official” Docker images that have been published by software developers. These official images fall into several categories including Base OS (Ubuntu, Debian, Alpine, CentOS, Fedora, etc), Infrastructure (Apache, Nginx, Registry, Kafka, RabbitMQ, etc.), Language Centric (Python, Go, Node.js, Java, etc.) and Databases (MongoDB, Couchbase, Redis, PostgreSQL, etc). There are also numerous unofficial Docker images available. For security reasons I personally do not recommend using unofficial images. Alternatively, I suggest that if you find an unofficial image you like, instead of pulling the image you should find the base Github repo and look at their Dockerfile. Then adapt that Dockerfile to your needs and run the docker image build… yourself.

If you want to push your own images to Docker Hub, you must sign up for an account, similar to how you would sign up for an account on GitHub or other such services. Once you have signed up, you should issue the command docker login which will prompt you for your Docker Hub username and password.

Once you are logged into Docker Hub you can push images to your own repository. Do you remember the part where I said that your Docker Hub username should be the /repository/ part of any Docker images you create if you’re pushing to Docker Hub? This is why.

If you have put something other than your Docker Hub username in that part of your image name, you may get “access denied” errors when you try to push your images because you are literally trying to push the Docker image to somebody else’s repository.

The docker login command has another use. If you or your workplace hosts an internal Docker Registry, you can login to that registry by using the command docker login my.dockerregistry.com. It will then go through the same username/password prompts and get you logged in to your internal Docker registry.

Push it Real Good

Now that we’ve gone through all of this, we can finally push our shiny new Docker image to the Docker Registry. Execute docker image push some.registry.com/repository/image-name:tag. Assuming you’re logged in and have permission to push, you will see Docker connect to the registry and push the image. There really isn’t much more to docker push.

Once the push is complete you can pull that same image to another computer by running docker image pull some.registry.com/repository/image-name:tag.

We Built This Image

Between this and my previous article I’ve walked you through all of the basic Docker functions. This includes image pull, image push, image build, image rm, container run, container stop, and container rm. While there is much more to discover about Docker, you should now have the skills and enough knowledge of the Docker platform to run Docker containers and to build your own custom container images that meet your needs. If you want to continue learning more about Docker I would suggest you read articles about optimizing Docker image sizes and begin to look into multi-container applications and Docker Compose. Also, please keep an eye out for my next article!

Dedication — To my dad, Eugene Mucciarone, who we lost while I was writing this article. You always pushed me to go beyond, and here are the results.

DISCLOSURE STATEMENT: © 2019 Capital One. Opinions are those of the individual author. Unless noted otherwise in this post, Capital One is not affiliated with, nor endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are property of their respective owners.

--

--

Michael Mucciarone
Capital One Tech

Devops Engineer; Artist at heart; Technologist by day. Container and DevOps Enthusiast with a side of UI/UX designer and a degree in Music.