A Step-by-Step Guide to Creating a Docker Image

The Anatomy of the Dockerfile and How to Build, Tag, and Push Your Docker Image to Docker Hub

Harpreet Sahota
Pachyderm Community Blog

--

Photo by Rubaitul Azad on Unsplash

In this blog post, you’ll learn how to push a Docker Image to Docker Hub.

You’ll learn:

  • What a Docker image and a Docker container are
  • The anatomy of a Dockerfile
  • What it means to build a Docker image
  • What it means to tag a Docker image
  • How to push your Docker image to Docker Hub.

All of this may seem like a foreign language if you’re brand new to Docker. Fear not! The author of this article is unashamed to say he had zero clue how to do any of this before he started his job as a Developer Advocate at Pachyderm. It’s a simple process and you’ll be a pro at it in no time.

We’ll walk through the process, side-by-side and step-by-step…but first some prerequisites.

Prerequisites

I highly encourage you to follow along with me!

Here’s what you’ll need to do so:

If you haven’t already, go through the documentation to set up Docker Desktop and register for Docker Hub. Once you’re registered be sure to log in to Docker Hub by running docker login.

Let’s get into it!

What is a Docker Image and a Docker Container?

Photo by Guillaume Bolduc on Unsplash

When the author first started out with Docker, he would use the terms image and container interchangeably.

That’s actually not the right way to think about them.

You should think of a Docker image as a blueprint — or source code — for building a container. This blueprint has the instructions needed to build a lean operating system. It also contains all the required libraries, code, and dependencies needed for an application to successfully execute as it was designed to.

If you have an application’s Docker image, the only other thing you need to run that application is a computer running Docker.

Nigel Poulton

How is a Docker image different from a Docker container?

It comes down to one thing: a container is an image waiting to be jump started. The Docker image is the packed up, immutable blueprint that has everything needed to successfully run your application. The container is that blueprint turned into a house.

It’s the living, breathing, running (metaphorically, of course) instantiation of your image.

You can take this image and run as many containers from it as you like. You can run multiple instances of that container on your local machine. You can run the image and have the same container on a friends laptop. Or you can run in the cloud.

Wherever you run the container it will execute the code inside exactly how you intended it to.

In the Docker world, an image is effectively a stopped container. If you’re a developer, you can think of an image as a class.

Nigel Poulton

But to get the running container and the image that created it, you have to first create a Dockerfile.

Going from a Dockerfile, to a Docker image, to a container and back. Source

The Anatomy of a Dockerfile

An important thing to note about Docker images is that they are made up of layers.

Each of these layers are read-only and the entire image is a combination of these layers stacked in the order in which they were added. Each of these layers are actually complete images in themselves. Each additional layer introduces a change to the image’s metadata and filesystem.

How these layers are assembled into an image is defined by the Dockerfile.

Nigel Poulton’s course “The Beginner’s Guide to Docker”

The Dockerfile is a set of instructions that tells Docker how to put your image together.

This easy-to-read file describes the application, it’s code and it’s dependencies. It’s a text file like any other piece of code and should be version controlled as such. Because this file is so instructive and easy to read it could also serve as documentation.

Docker builds images automatically by reading the instructions from a Dockerfile -- a text file that contains all commands, in order, needed to build a given image

Docker Documentation

Below is the anatomy of a Dockerfile. Note that there are many more commands than what’s listed below. These are just some of the more common ones you’ll likely need to know when creating your first Dockerfile.

Thom Ives series on Docker

Let’s go through some of the fundamental ones — FROM, WORKDIR, COPY, ADD, RUN, and CMD.

FROM

All Dockerfiles start with this command.

This command builds an initial layer from an existing image. Every image is based on another base image. Specifying a base image will allow you to extend that images and make it more complex.

A base image that you may find helpful for data science work is the Civis Analytics Data Science Docker Image.

WORKDIR

This command sets the working directory inside the image’s filesystem. This doesn’t add a new layer to the image.

COPY

This command will create a new layer by copying files and directories from the Docker client (the client is the primary way that Docker users interact with Docker) to the Docker image.

ADD

This command is also used to copy files and directories into a Docker image and it does so in three ways:

  1. Add files from local storage to some destination on the Docker file system.
  2. Add a tarball from your local storage and extract it to someplace in the Docker filesystem.
  3. Add files from a URL to someplace in the Docker filesystem.
Source

Confused whether you should use ADD or COPY in Your Dockerfile?

Here’s what Dockerfile best practices recommends:

Although ADD and COPY are functionally similar, generally speaking, COPY is preferred. That’s because it’s more transparent than ADD. COPY only supports the basic copying of local files into the container, while ADD has some features (like local-only tar extraction and remote URL support) that are not immediately obvious. Consequently, the best use for ADD is local tar file auto-extraction into the image, as in ADD rootfs.tar.xz /.

RUN

This will execute a Linux command on the image’s command line and create a new layer.

RUN executes when your image is built and is used to install dependencies and packages you need to make your code work. A key assumption you make here is that your base image will have the basics you need so these dependencies can be installed and run on top of it.

For example if your base image is base Python 3.8 you can use RUN to install numpy, pandas, etc.

CMD

This command is what actually runs the code that’s in your image with its arguments.

This happens once the container is created. Keep in mind that you can only have one CMD command per Dockerfile. In the event that you put more than one, the only one to be executed is the last one in the Dockerfile.

According to the Docker documentation, the CMD instruction has three forms:

  • CMD ["executable","param1","param2"] (exec form, this is the preferred form)
  • CMD ["param1","param2"] (as default parameters to ENTRYPOINT)
  • CMD command param1 param2 (shell form)

By the way, you must spell it like this: Dockerfile. Not Docker File, dockerfile, docker file, DockerPhile, DockerFyle…it’s Dockerfile. Technically you could use any file name you want, but people might look at you weird. Using Dockerfile makes it easier for other people to identify it and understand its purpose. Plus you don’t have to specify the filename when building the image.

Speaking of building an image, let’s talk about what that means.

Step 1: Build

Once you’ve finished the Dockerfile, you can move on to building a Docker image.

The build command takes your Dockerfile and builds a Docker image. Assuming you’re currently in the same directory as your Dockerfile, you can build an image by running the following command:

docker image build <image-name>:<image-version> .

By running this command the image is built with the specified operating system and packages installed. That path you provide in the build command becomes the build context for your container, which serves as the root directory for the container. You can do a sanity check to ensure your image was built by executing:

docker image ls

Now that your image has been built, it’s ready to be tagged.

Step 2: Tag

You might end up creating a number of images, or even versions of the same image.

If your memory is anything like mine, trying to remember which image is is painful. Docker helps you remember by allowing you to tag your images with whatever name you find most memorable. The tag for an image is a mutable named reference and acts like branch refs in Git.

When building images, always tag them with useful tags which codify version information, intended destination (prod or test, for instance), stability, or other information that is useful when deploying the application in different environments. Do not rely on the automatically-created latest tag.

Docker documentation

Below is the syntax for tagging your Docker image.

docker image tag <image-name>:<image-version> <your-docker-hub-username>/<image-name>:<image-version>

Now that you’ve built and tagged your image, it’s time to push it to Docker Hub. A few things to keep in mind when coming up with the tag for your image:

  • The tag name must be valid ASCII
  • Can only contain lowercase and uppercase letters, digits, underscores, periods and dashes.
  • Can’t start with a period or a dash
  • Can only contain a maximum of 128 characters

With your image tagged, it’s time to push it to a Docker registry.

Step 3: Push

Right now the image you created is only on your local machine.

In order for you to make this image available to be pulled down and run on any machine, anywhere, you need to push it to a Docker registry. Docker Hub is the most widely used Docker registry and contains a number of useful official images that you can leverage as base images. Now that you’ve built and tagged your image all you have to do is execute the following command to push it:

docker push <your-docker-hub-username>/<image-name>:<image-version>

There you have it.

You now know what to do in order to build, tag, and push a Docker image. Now if you need to pull that image on another machine or for another application all you have to is execute:

docker pull <your-docker-hub-username>/<image-name>:<image-version>

Editor’s Note: The Pachyderm Community blog is a contributor-driven online publication and community dedicated to providing educational resources for data engineering and MLOps practitioners.

This publication is sponsored and published by Pachyderm, the leader in data versioning and pipelines for MLOps. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also join us on Slack, and follow Pachyderm on Twitter and LinkedIn for resources, events, and much more that will help you build automate and scales the machine learning lifecycle while guaranteeing reproducibility.

--

--

Harpreet Sahota
Pachyderm Community Blog

🤖 Generative AI Hacker | 👨🏽‍💻 AI Engineer | Hacker-in- Residence at Voxel 51