4 Dockerfile Best Practices You Must Get Acclaimed with in 2023

Oren Spiegel
Analytics Vidhya
Published in
5 min readJun 28, 2023

Most Dockerfile writing guidelines are becoming archaic. They are overly technical and don’t focus on the elements of Dockerfile writing that will actually result in a better performing, faster scaling application.

This is why I decided to take on the challenge and share with you 4 general guidelines you should self acquaint with, especially now as scalability needs become more of a focal point, as companies look to process data at near real time with a renewed emphasis on smarter spending.

Without further ado, here are my 4 best practice guidelines, in no particular order.

1. Take Advantage and Beware of Dockerfile Caching Mechanism

My biggest suggestion is that you keep lines related to copying of the source code at the very bottom of the Dockerfile. This is because it is the Dockerfile layer that will change most frequently in between builds. Dockerfile caching mechanism works sequentially moving line by line downwards. When it recognizes a line that requires re-running, it will be forced to run every single line that appears after this line. With this in mind, try to keep your source code COPY line at the very bottom, like in the following example:

FROM python:3.10-slim-buster

WORKDIR /app

COPY requirements.txt /app
RUN pip3 install -r requirements.txt

COPY src /app

ENTRYPOINT ["python3"]
CMD ["src/app.py"]

Beware of how Dockerfile caching mechanism works. For instance, below is a bad practice I see developers make when writing Dockerfiles:

FROM python:3.10-slim-buster

WORKDIR /app

COPY requirements.txt /app
RUN pip3 install -r requirements.txt

RUN apt update && apt install git
RUN git clone https://github.com/examplegituser/example-public-git-repo.git

ENTRYPOINT ["python3"]
CMD ["src/app.py"]

Notice the usage of “git clone” in order to obtain the source code of the app. There are several reasons why this is a very bad practice but the one I want to focus on is related to caching. Docker will not know when a change was pushed to the remote git repo, so it will defer to its local cache even if new code was pushed to master since the previous build. I’ve seen developers manually “bust” the cache in order to force Docker to rerun such line during the build by pre-inserting a mock Dockerfile command right above it ( a command like RUN echo “hello”). This is an ugly hack. Your source code should reside right next to your Dockerfile inside the project root directory.

2. Install the Bare Minimum

This recommendation is rather well covered in online. A lean docker app will scale much faster due to shorter pulling times. Opt to use the leanest base images, and add only what your app uses on top. A big mistake junior developers make is settling for an image like python:3.10 because it works. In 99% of cases python:3.10-slim, or even python:3.10-slim-buster has everything a python app might need. Big base images come with a lot of stuff you may not need, so make sure to go the extra mile and shed off megabtyes! Using a common example, if the python modules require a C compiler, install gcc package separately, like so:

FROM python:10-slim

RUN apt-get update
RUN apt-get install -y build-essential

3. Make Your Dockerfile ARM compatible! Build for Multiplatform Support

As a first step you will need to install a docker version which includes docker buildx, so make sure to upgrade the docker version installed on your local machine.

As a second step you will have to install the platform emulators you would like to support the build for. Most likely your computer runs on an amd64 processor so you will need to install an arm64 processor emulator.
You can install both architectures with the below command.

docker run --rm --privileged tonistiigi/binfmt --install "linux/amd64,linux/arm64"

Then you can build, and push your docker using the below docker buildx command.

 
docker buildx build — network host \
— tag demo-remote-repo-address.com/sampleTag:1.0.1 \
— file /absolute/path/to/your/Dockerfile \
— provenance=”true” \
. \
— platform “linux/amd64,linux/arm64” \
— output=type=image,push=true — push

Just note that your Dockerfile layers will have to each individually be linux/arm64 processor compatible. This part is difficult as you may need to spawn an ARM64 linux virtual machine just to confirm all layers are indeed compatible. I found this nice guide that covers how to do this on Mac.

For ECR (AWS) Users

To Save AWS users time, when uploading to ECR, pass the ‘provenance’ flag the value “false”. Also ECR build can only support one type of architecture per tag for now. So ECR users - please differ to the below for ARM support build example using docker buildx.

docker buildx build --network host \
--tag 1812759205.dkr.ecr.us-east-1.amazonaws.com/sampleTag:1.0.1-arm64 \
--file /absolute/path/to/your/Dockerfile \
--provenance="false" \
. \
--platform "linux/arm64" \
--output=type=image,push=true --push

I wont cover the advantages of running your docker apps on an ARM64 processor in deep. You can click here, to go into the details, but in short it is 15–20% more CPU efficient and hence cheaper (on AWS Graviton2).

4. Lock dependency versions to fixed format (d.d.d) . Avoid “stable”, or “latest” pulls!

Make sure your requirements.txt file looks like this:

certifi>=2022.12.07
pandas==1.4.2
boto3==1.24.12
pymongo==4.1.1

as opposed to this

certifi
pandas
boto3
pymongo

It’s a very unpleasant surprise when your Dockerfile local cache wipes (for whatever reason), and your loose requirements.txt file result in installation of newer module versions which may or may not still contain the methods and classes you’re depending on in your code. Avoid this like the plague.

Below is a bad example of a Dockerfile designed to run Python Selenium on a Chrome browser. It depends too heavily on ‘stable’, ‘current’ and ‘latest’ versions. Take a look:

FROM python:3.7

RUN apt-get update
RUN apt-get install -y gconf-service libasound2 libatk1.0-0 libcairo2 libcups2 libfontconfig1 libgdk-pixbuf2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libxss1 fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils

#download and install chrome
RUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
RUN dpkg -i google-chrome-stable_current_amd64.deb; apt-get -fy install

#install python dependencies
COPY requirements.txt requirements.txt
RUN pip install chromedriver-binary==79.0.3945.36

#set workspace
WORKDIR /app

#copy local files
COPY . .

CMD exec gunicorn --bind :5000 --workers 1 --threads 8 main:app

The above will result in a mismatch between Google Chrome, and Chromedriver binary. Notice that the example installs the stable (current) version of Google Chrome, yet is locked on a fixed chromedriver-binary version (79.0.3945.36). If you try to build this docker a much newer version of Chrome will install (113 at the time of writing), which will be incompatible with the chrome binary fixed version. It will result in:

Message: session not created: This version of ChromeDriver only supports Chrome version 79

The solution here is to download a fixed version of the google chrome browser, like so:

RUN wget http://orion.lcg.ufrj.br/RPMS/myrpms/google/google-chrome-stable-79.0.3945.36.x86_64.rpm

Takeaways

  • Keep frequently changing Dockerfile lines at the bottom of the definition file.
  • Start with the slimmest base image and add missing packages on top.
  • Build your Dockerfile to support ARM64 processors. This is an industry standard now, don’t be left behind.
  • Fixate versions on actual digits (d.d.d). Avoid ‘Latest’ , ‘Stable’, or ‘Current’ versions!

--

--