Docker Tips: Clean Up Your Local Machine

Understand disk space usage and reclaiming the unused part

Luc Juggery
Dec 16, 2019 · 9 min read
Image for post
Image for post
Source: Xmodulo on Flickr

In this piece, we’ll go back to basics. We will look at how Docker uses the disk space of the host machine and how to reclaim it when it is not being used anymore.

Image for post
Image for post

Overall Consumption

Docker is great, there’s no doubt about that. A couple of years ago, it provided a new way to build, ship and run any workloads by democratizing the usage of containers and hugely simplifying management of their lifecycle.

It also brought the developer the ability to run any applications without polluting the local machine. But, when we run containers, pull images, deploy complex application stacks, and build our own images the footprint on our host filesystem might increase in a significant way.

If we have not cleaned up our local machine for a while we might be surprised by the result of this command:

$ docker system df
Image for post
Image for post
Example of Docker’s footprint on the host filesystem

This command shows Docker’s disk usage in several categories:

  • Images: The size of the images that have been pulled from a registry and the ones built locally.
  • Containers: The disk space used by the containers running on the system, meaning the space of each containers’ read-write layer.
  • Local Volumes: Storage persisted on the host but outside of a container’s filesystem.
  • Build Cache: the cache generated by the image build process (only if using BuildKit, available from Docker 18.09).

From the output above, we can see quite a lot of disk space can be reclaimable. In other words, as it’s not in use by Docker, it can be given back to the host machine.

Containers Disk Usage

Each time a container is created, several folders and files are created under /var/lib/docker on the host machine. Among them:

  • the /var/lib/docker/containers/ID folder (ID being the container’s unique identifier). If the container uses the default logging driver, all its logs will be persisted in a JSON file within this folder. In this context, generating too many logs might impact the filesystem of the host machine.
  • a folder within /var/lib/docker/overlay2 which contains the container’s read-write layer (overlay2 being the preferred storage driver on most Linux distributions). If the container persists data in its own filesystem, those will be stored under /var/lib/docker/overlay2 on the host machine.

Let’s imagine we have a brand new system where Docker has just been installed.

$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 0 0 0B 0B
Containers 0 0 0B 0B
Local Volumes 0 0 0B 0B
Build Cache 0 0 0B 0B

First, we start a NGINX container:

$ docker container run --name www -d -p 8000:80 nginx:1.16

Running the df command again, we can now see:

  • one image with a size of 126MB. This is the NGINX:1.16 one pulled when we launched the container.
  • one container — the www container run from the NGINX image.
$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 1 1 126M 0B (0%)
Containers 1 1 2B 0B (0%)
Local Volumes 0 0 0B 0B
Build Cache 0 0 0B 0B

There is no reclaimable space yet as the container is running and the image is currently in use. As the size of the container (2B) is negligible and thus not easy to track on the filesystem, let’s create an empty 100MB file in the container’s filesystem. For this purpose, we use the handy dd command from within the www container.

$ docker exec -ti www \
dd if=/dev/zero of=test.img bs=1024 count=0 seek=$[1024*100]

This file is created in the read-write layer associated with this container. If we check the output of the df command again, we can now see the container now takes up some additional disk space.

$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 1 1 126M 0B (0%)
Containers 1 1 104.9MB 0B (0%)
Local Volumes 0 0 0B 0B
Build Cache 0 0 0B 0B

Where is this file located on the host? Let’s take a look:

$ find /var/lib/docker -type f -name test.img
/var/lib/docker/overlay2/83f177...630078/merged/test.img
/var/lib/docker/overlay2/83f177...630078/diff/test.img

Without going too deep into the details, this file was created in the container’s read-write layer which is managed by the overlay2 driver. If we stop the container, the disk space used by the container becomes reclaimable. Let’s take a look:

# Stopping the www container
$ docker stop www
# Visualizing the impact on the disk usage
$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 1 1 126M 0B (0%)
Containers 1 0 104.9MB 104.9MB (100%)
Local Volumes 0 0 0B 0B
Build Cache 0 0 0B 0B

How can this space be reclaimed? By deleting the container, which will delete the associate read-write container’s layer.

The following commands allow us to delete all stopped containers at once and to reclaim the disk space they’re using:

$ docker container prune
WARNING! This will remove all stopped containers.
Are you sure you want to continue? [y/N] y
Deleted Containers:
5e7f8e5097ace9ef5518ebf0c6fc2062ff024efb495f11ccc89df21ec9b4dcc2
Total reclaimed space: 104.9MB

From the output, we can see there is no more space used by containers and, as the image is not used anymore (no container is running), the space it uses on the host filesystem can be reclaimed:

$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 1 0 126M 126M (100%)
Containers 0 0 0B 0B
Local Volumes 0 0 0B 0B
Build Cache 0 0 0B 0B

Note: As soon as an image is used by at least one container, the disk space it uses cannot be reclaimed.

The prune subcommand we used above removes the stopped containers. If we need to remove all containers, the running ones and the stopped ones we can use one of the following commands (both are equivalent):

# Historical command
$ docker rm -f $(docker ps -aq)
# More recent command
$ docker container rm -f $(docker container ls -aq)

Note: It’s often useful to use the --rm flag when running a container so that it is automatically removed when it’s PID 1 process is stopped, thus releasing unused disk immediately.

Images Disk Usage

A couple of years ago, it was common to have several hundred MB per image. Ubuntu was around 600MB, Microsoft .Net images weighed several GB (true story). At that time, pulling only a couple of images could quickly impact the disk space of the host machine, even if the layers are shared between images. This is less true today — base images are much lighter — but after a certain amount of time, piling up images will definitely have an impact if we’re not careful.

There are several kinds of images that are not directly visible to the end-user:

  • Intermediate images are referenced by other images (child image) and cannot be removed
  • Dangling images are images no longer referenced. They take some disk space and so can be deleted

The following commands list the existing dangling image on the system:

$ docker image ls -f dangling=true
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> 21e658fe5351 12 minutes ago 71.3MB

To remove the dangling image we can go the long way:

$ docker image rm $(docker image ls -f dangling=true -q)

Or we can use the prune subcommand:

$ docker image prune
WARNING! This will remove all dangling images.
Are you sure you want to continue? [y/N] y
Deleted Images:
deleted: sha256:143407a3cb7efa6e95761b8cd6cea25e3f41455be6d5e7cda
deleted: sha256:738010bda9dd34896bac9bbc77b2d60addd7738ad1a95e5cc
deleted: sha256:fa4f0194a1eb829523ecf3bad04b4a7bdce089c8361e2c347
deleted: sha256:c5041938bcb46f78bf2f2a7f0a0df0eea74c4555097cc9197
deleted: sha256:5945bb6e12888cf320828e0fd00728947104da82e3eb4452f
Total reclaimed space: 12.9kB

In case we need to remove all images at once (not only the dangling ones) we can run the following command. This will not be able to remove the images currently used by a container though:

$ docker image rm $(docker image ls -q)

Volumes Disk Usage

Volumes are used to store data outside of a container filesystem. For instance, when a container runs a stateful application we want the data to be persisted outside of the container so they are decoupled from the container life-cycle. Volumes are also used because heavy filesystem operations inside the container are bad for performance.

Say we run a container based on MongoDB and then use it to test a backup we previously did (available locally in the bck.json file):

# Running a mongo container
$ docker run --name db -v $PWD:/tmp -p 27017:27017 -d mongo:4.0
# Importing an existing backup (from a huge bck.json file)
$ docker exec -ti db mongoimport \
--db 'test' \
--collection 'demo' \
--file /tmp/bck.json \
--jsonArray

The data within the backup file will be stored on the host in the /var/lib/docker/volumes folder. Why is this data not saved within the container’s layer? Because in the mongo image’s Dockerfile the location /data/db (where mongo stores its data by default) is defined as a volume.

Image for post
Image for post
Extract of the Dockerfile used to build the MongoDB container image

Note: Many images, often related to stateful applications, define volumes to manage data outside of the container’s layer.

When we are done testing the backup we stop or remove the container. But the volume is not removed — it stays there consuming disk space unless we explicitly remove it. To remove the volumes not used any longer, we can go the long way:

$ docker volume rm $(docker volume ls -q)

Or we can use the prune subcommand:

$ docker volume prune
WARNING! This will remove all local volumes not used by at least one container.
Are you sure you want to continue? [y/N] y
Deleted Volumes:
d50b6402eb75d09ec17a5f57df4ed7b520c448429f70725fc5707334e5ded4d5
8f7a16e1cf117cdfddb6a38d1f4f02b18d21a485b49037e2670753fa34d115fc
599c3dd48d529b2e105eec38537cd16dac1ae6f899a123e2a62ffac6168b2f5f
...
732e610e435c24f6acae827cd340a60ce4132387cfc512452994bc0728dd66df
9a3f39cc8bd0f9ce54dea3421193f752bda4b8846841b6d36f8ee24358a85bae
045a9b534259ec6c0318cb162b7b4fca75b553d4e86fc93faafd0e7c77c79799
c6283fe9f8d2ca105d30ecaad31868410e809aba0909b3e60d68a26e92a094da
Total reclaimed space: 25.82GB
luc@saturn:~$

Build Cache Disk Usage

The Docker 18.09 release introduces enhancements for the build process through BuildKit. Using this tool can improve performance, storage management, feature functionality, and security. We won’t detail BuildKit in this piece, but just look at how to enable it and how it affects disk usage.

Let’s consider the following dummy Node.Js application and its associated Dockerfile:

index.js file defines a simple HTTP server which exposes the ‘/’ endpoint and replies with a string for each request received:

var express = require('express');
var util = require('util');
var app = express();
app.get('/', function(req, res) {
res.setHeader('Content-Type', 'text/plain');
res.end(util.format("%s - %s", new Date(), 'Got Request'));
});
app.listen(process.env.PORT || 80);

package.json defines the dependencies: only expressjs here, to set up the HTTP server:

{
"name": "testnode",
"version": "0.0.1",
"main": "index.js",
"scripts": {
"start": "node index.js"
},
"dependencies": {
"express": "^4.14.0"
}
}

Dockerfile defines how to build an image from the code above:

FROM node:13-alpine
COPY package.json /app/package.json
RUN cd /app && npm install
COPY . /app/
WORKDIR
/app
EXPOSE 80
CMD ["npm", "start"]

Let’s build an image as we usually do, without BuildKit enabled:

$ docker build -t app:1.0 .

If we check the disk usage, we only see the base image (node:13-alpine pulled at the beginning of the build) and the final image of the build (app:1.0):

$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 2 0 109.3MB 109.3MB (100%)
Containers 0 0 0B 0B
Local Volumes 0 0 0B 0B
Build Cache 0 0 0B 0B

Let’s now build the version 2.0 of the image using BuildKit. We just need to set the DOCKER_BUILDKIT to 1:

$ DOCKER_BUILDKIT=1 docker build -t app:2.0 .

If we check the disk usage once more, we can see build-cache was created:

$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 2 0 109.3MB 109.3MB (100%)
Containers 0 0 0B 0B
Local Volumes 0 0 0B 0B
Build Cache 11 0 8.949kB 8.949kB

To remove the build cache, we can use the following command:

$ docker builder prune
WARNING! This will remove all dangling build cache.
Are you sure you want to continue? [y/N] y
Deleted build cache objects:
rffq7b06h9t09xe584rn4f91e
ztexgsz949ci8mx8p5tzgdzhe
3z9jeoqbbmj3eftltawvkiayi
Total reclaimed space: 8.949kB

Cleaning Everything at Once

As we saw in the examples above, each of the container, image and volume commands provides the prune subcommand to reclaim disk space. The prune subcommand is available at the Docker’s system-level so it reclaims all the unused disk space at once:

$ docker system prune
WARNING! This will remove:
- all stopped containers
- all networks not used by at least one container
- all dangling images
- all dangling build cache
Are you sure you want to continue? [y/N]

Running this command once in a while to clean up the disk is a good habit to have.

Better Programming

Advice for programmers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store