Harbor : Container Images Cleanup automation

Mahammed Sahid Shaik
3 min readApr 28, 2019
Container Images Cleanup automation — Remove old/unnecessary container images from your registry

Our team has started using Harbor- which is an open source cloud native registry that stores, container images . It also signs, and scans container images for vulnerabilities.

As we were dealing with repositories and container images and there are many images for different stacks like development, staging and production.Storing them in Harbor without any deletion strategy results in no space left on harbor hosted machine.

Harbor captures the documentation of deleting repositories and container images.It is a 2 step process

First you need to delete a container images from Harbor’s UI. This is soft deletion. And then you need to run Garbage Collector which actually deletes the files from repository.

Now the problem statement is removing the repository files or container images from Harbor’s UI is a cumbersome process and requires manual intervention.

Lets see how can we automate the process of deletion of repository container images using Harbor’s REST API and Shell scripting with awk commands instead of manually deleting from Harbor’s UI

FYI , In this article I have used the term builds, tags synonymously.

Harbor provides range of REST APIs to get all repositories, all tags in repos and support for deletion of tags .

First use curl command to get all repositories in a project in JSON format

curl  -u <harbor_user>:<harbor_pswd> -X GET http://<harbor hostname>/api/repositories?project_id=<project_id > REPOS.json

Afterwards run awk command to extract the repositories from JSON

awk '/name/ {print $2}' REPOS.json | tr -d '",' | sort | sed 's:.*/::'

Once you get all the required repositories — it's time to get all tags from all repositories

curl -u <harbor_user>:<harbor_pswd> -X GET http://<harbor_host>/api/repositories/<REPO>/$repo/tags 

Since the container build naming follows a convention , I’m sorting based on build names and printing all builds except last one build. So that we don’t want to delete the greatest and latest build — you can always change this number based on your needs . Suppose if you want to preserve last 5 build use ‘6,$p’

awk '/<pattern>/{print}'  <REPOS file> | sort -Vr | sed -n '2,$p'

Now inorder to delete the tags , you can run this command

curl -u <harbor_user>:<harbor_pswd> -X DELETE  http://<harbor_host>/api/repositories/<REPO>/$repo/tags/<tagname>

The output will be shown as follows — if the operation is successful you will see status code as 200 .

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload Upload Total Spent Left Speed
100 1254 100 1254 0 0 4985 0 --:--:-- --:--:-- --:--:-- 4996

* About to connect() to xxxx port 80 (#0)
* Connected to <harbor host> port 80 (#0)
* Server auth using Basic with user 'admin'
> DELETE /api/repositories/
> Authorization: Basic <token>
> User-Agent: curl/7.29.0
> Host: xxxxxxx
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx
< Date: Sun, 28 Apr 2019 12:14:53 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 0
< Connection: keep-alive
< Set-Cookie: sid=1xxxxx; Path=/; HttpOnly
<
* Connection #0 to host xxxxx left intact

Then run Garbage collector command which free up the space by removing blobs from the filesystem.

bin/registry garbage-collect /path/to/config.yml

In Order to sum up , the below shell script works by removing the tags from Harbor’s repositories and doesn’t requires any manual intervention .

And you can always add this script to your Jenkins CI/CD Pipeline or add it as part of cron scheduler based on your requirement.

I hope you found this article helpful and thanks for your reading !!

--

--