Optimize Docker Private Registry storage by removing unused docker tags.

Co-Author: Jagdish Komakula

Cloud native application uses Docker containers as building blocks. Each container is an image with a readable/writable layers on top of a bunch of read-only layers. These layers are generated when the commands in the Dockerfile are executed during the Docker image build phase.

For example, here is a Dockerfile for creating a WebSphere liberty image . It shows the commands that are executed to create the image.

FROM websphere-liberty
RUN installUtility install --acceptLicense clusterMember-1.0
ADD server.xml /opt/ibm/wlp/usr/servers/defaultServer/
ADD application_file /opt/ibm/wlp/usr/servers/defaultServer/dropins/

When Docker builds the container from the above Dockerfile, each step corresponds to a command run in the Dockerfile and each layer is made up of the file generated from running that command.

With each step mentioned in Dockerfile, a layer is created with a random ID.

$ docker build -t wasliberty .Sending build context to Docker daemon  4.096kB 
Step 1/4 : FROM websphere-liberty
---> 9528378369a2
Step 2/4 : RUN installUtility install --acceptLicense clusterMember-1.0
---> Running in 01007ad22baa
Establishing a connection to the configured repositories
.........
.........
Product validation completed successfully.
Removing intermediate container 01007ad22baa
---> 04bde725f004
Step 3/4 : ADD server.xml /opt/ibm/wlp/usr/servers/defaultServer/
---> 23e829681327
Step 4/4 : ADD application_file /opt/ibm/wlp/usr/servers/defaultServer/dropins/
---> db3418fde7ca
Successfully built db3418fde7ca
Successfully tagged wasliberty:latest

Once the image is built you can view all the layers that make up the image with the docker history command.

$docker history waslibertydb3418fde7ca        2 minutes ago       /bin/sh -c #(nop) ADD file:6833ebb8b15ac26e3…   0B 
23e829681327 2 minutes ago /bin/sh -c #(nop) ADD file:99d22363395f43da2… 0B
04bde725f004 2 minutes ago /bin/sh -c installUtility install --acceptLi… 26.5MB
9528378369a2 9 months ago /bin/sh -c #(nop) COPY --chown=1001:0file:f2… 1.66kB

Docker Registry Structure

The Registry is a stateless, highly scalable server side application that stores and lets you distribute Docker images with multiple tags using HTTP API.

Docker tags provides information about a specific image version. A tag is composed of several layers. The list of the layers for that particular digest is called a manifest. There is a corresponding blob for each layer.

Layers are stored in as blobs in the v2 registry API, keyed by their content digest.

The Docker registry stores the Docker images in the file system using Blobs, Image Manifest, Manifest List and Tag objects under the following path in the file system:

${REGISTRY_DIR}/docker/registry/v2/repositories

Why to remove individual tags of an image ?

When you push daily docker builds with different tags to your private registry, over a period of time you end up with multiple tags for a single image. There could be blobs which are not used by any of the tags. These layers are called abandoned or unused layers. The disk size of such unused layers can grow a lot and cause storage issues. Being able to remove individual tags can optimize the storage.

Do not tag images as “latest” only:

When the tags are always referenced as ‘latest’ only (very bad approach, always number the tags), over the period, there would be huge number of unused blobs in repository as current latest tag may not be using the blobs which were in use in old ‘latest’ tag used a month back. Such blobs are impossible to remove unless entire tag ‘latest’ is removed before its pushed again to repository. Don’t be surprised to see hundreds or thousands of blobs in such “latest” image which are mostly unused.

You can use kubectl command to cleanup the entire image with all tags.

kubectl get images –n <namespace>
kubectl delete image <image_name> -n <namespace>

Docker Garbage collection:

Docker registry provides command “garbage-collect” to perform cleanup in docker registry but this command only cleanup blobs which are not referenced by any manifest. Since there are many tags of single image name, those blobs are never marked for garbage collection and always remain in docker registry though those are of no use. In some cases, when image has more than 100 tags, the disk usage of such image can be more than 20 GB.

Optimize number of docker tags for an image

Standalone Vs Centralized repositories

Using Standalone repositories, same image will be available at many clusters (not so good storage idea). Best practice for maintaining images would be to keep images in centralized repository like quay.io or docker.io. In such case build the image once and Promote / Pull to different environments. Now you need to worry about the storage space for single repository only rather standalone repositories. But again this still has the same issue of unused blobs to be cleaned up under single image.

Cleanup of unused tags from storage

Cleanup of all tags from an image but last five tags. These last five tags can be used to rollback purpose. To clean up these tags you need to delete two directories from repository location or use docker v2 APIs.

rm –r  <reg_mount_loc>/v2/repositories/${name}/_manifests/tags/${tag}/index/sha256/${hash}rm -r <reg_mount_loc>/v2/repositories/${name}/_manifests/revisions/sha256/${hash}Note: reg_mount_loc is registry mount location such as /var/lib/registry

But cleaning up huge number of images with huge number of tags can be tedious job using APIs. Here is code snippet which can be used to clean up all tags but last five tags from entire repository or specific namespace.

Refer to https://github.com/deepjagdish/regcleanup for command usage.

Trigger Garbage collection now….

Once this command is executed, unused blobs will be available for garbage collection. Managed Kubernetes deployment such as Openshift or IBM Cloud Private provides mechanism to interact with registry to cleanup blobs using image-registry pods. Once the blobs are marked for garbage collection, these utilities can clean/sweep those unused, unwanted blobs.

Garbage collection for Openshift

Garbage collection for IBM Cloud Private

Garbage collection for Docker

--

--