YAHT: Enable both keep and keep-since pruners in Openshift Pipelines

Published in

Sopra Steria Norge

7 min readFeb 9, 2024

Welcome to the YAHT club — a series where I write simple, doable how-to’s. The series is dedicated to cloud-native operations and development, but other topics might be covered as well. YAHT, as you might have guessed, stands for Yet Another How-To (with reference to YAML, which you will see a lot here). So, let us dive into it.

This how-to will cover a simple way to enable both “keep” and “keep-since” pruners that are available with Openshift Pipelines (Tekton), such that Pipeline resources will be removed both after a certain period of time and a particular number of them has been accumulated. Why is this useful? Imagine a large company with many different teams and clusters, where each team works at its own pace — some build, test and deploy very rapidly and quite often — say, several times a day — whereas other teams might do the same once, or maybe twice, a month. To tidy up Pipeline resources in a manner that will “cover” both types of teams will prove difficult using only one type of pruner (provided they work on the same cluster).

So how does this work? Well, according to the official Tekton documentation, only one type of pruner can be used:

But, there is a way to enable both of them without the need to write one yourself. Let us have a look.

First off, I assume that you have already installed Openshift Pipelines Operator and that Pipelines are fully operational with either type of pruner configured.

Step 1

What you will need to do initially, is to go to your TektonConfig and modify it such that the Operator will deploy the other type of pruner that you currently have (here I have a type “keep”):

Change “keep” to “keep-since” and choose appropriate retention time:

Step 2

Extract the newly created CronJob as a YAML file:

$ oc get cj tekton-resource-pruner-2rgth -o yaml > tekton-resource-pruner-custom.yaml

Now, remove the ownerReferences section from the YAML file:

You can remove annotations, creationTimestamp, resourceVersion, uid, and status as well. Then change the name of the resource. It is not important what the name is, as long as it is different from the name of any existing CronJobs in the openshift-pipelines namespace, and conforms to the Kubernetes naming convention. Let us dub this one “tekton-resource-pruner-custom”.

Step 3

After completing steps 1 and 2, go to your TektonConfig and revert the changes you made, such that it will instruct the Operator to deploy the pruner you had previously:

You can now apply your custom pruner to your cluster:

$ oc apply -f tekton-resource-pruner-custom.yaml

Next steps

This might already work as you would expect, but there are several issues that you will need to address. One of them is that the Operator tends to “clear out” all other CronJob resources in the openshift-pipelines namespace. So, your newly formed custom pruner might be in danger of being automatically removed after a short period of time. Exactly what causes it to be removed I do not know, as have not bothered to delve into this issue.

Another thing to look out for is the fact that your project list inside your custom pruner CronJob will not be renewed or updated. If you create a new project/namespace in that cluster, and start building/testing/deploying, the resources generated by the Pipelines Operator will not be cleaned up by your custom pruner. This is due to the fact that your custom pruner contains a list of namespaces that essentially is a “snapshot” of your cluster the way it was when you created it, so it will not have any knowledge of namespaces you might create or delete afterward.

Both of these challenges can be dealt with, and here is how.

Enter Openshift GitOps

Or, more accurately, Argo CD.

Argo CD is a GitOps tool used, primarily, to automatically update resources in a cluster according to their respective manifests registered in a Git-type repository.

You can easily deploy it as an Operator from the Openshift Operator Hub inside your cluster (if you have not done so earlier).

Once deployed, you will need to instruct the Argo CD instance you will use (I used the default one in the openshift-gitops namespace) such that it will be capable of managing your CronJob resource in the openshift-pipelines namespace. There are two ways of doing so, that I know of:

Create a ClusterRole and ClusterRoleBinding with the following permissions:

2. Label the openshift-pipelines project with the following:

argocd.argoproj.io/managed-by; openshift-gitops # must be the name of your Argo CD instance

The reason the ClusterRole is more powerful is summarised here: https://argocd-operator.readthedocs.io/en/latest/usage/custom_roles/

Some resources simply will not be synchronized unless proper permissions are given to the application controller ServiceAccount of your Argo CD instance.

Therefore, my preferred method of choice is to use ClusterRole & ClusterRoleBinding combination. Though not necessarily the safest one, it is the most effective of the two. Furthermore, it gives me the flexibility to adjust, expand, or reduce the cluster-wide permissions of my Argo CD instance.

After this, you can deploy an Argo CD application that will monitor a specific Git repo and synchronize the manifests it contains with your target namespace. Here is a sample:

The image

Unless you already have one, create a container image that has both Git and jq in it, as well as openssh-clients (more on this later). This image will later be used to update the CronJob resource and store it in our Git repo. Here is the Containerfile I used:

The Secret

By this point, I assume that you have a Git repository available to you, where you can store your manifests in either JSON or YAML format. When you have built and pushed the image to your image registry, you can create a Secret with the SSH key to access your Git repository, for instance:

$ oc create secret generic ssh-auth --type=kubernetes.io/ssh-auth --from-file=ssh-privatekey=.ssh/id_rsa --from-file=known_hosts=.ssh/known_hosts

The repo

All that needs to be done now is upload your custom pruner (CronJob) manifest to your repository which is monitored by your Argo CD instance. Here is mine:

Let us walk through the CronJob manifest.

I have added 3 init containers to this CronJob that will make sure the resource itself is updated with the latest changes to the cluster (new namespaces added, deleted namespaces removed):

get-list: simply fetches the “list” of all the projects in the cluster.
update-repo: the script does several things — it fetches the repo, amends the manifest, then uploads the updated manifest back to the repo.
update-cronjob: it can take up to several minutes for Argo CD to synchronize the resource with the cluster, so we apply the latest changes to it directly.

Notice that I have added an environment variable to one of the init containers. This is purely for convenience — should the retention time need to change at some point, it will be easier to find where to change it using this approach.

Git requires authentication, and the better way of providing it is by using SSH keys. For this reason, you need to initialize the SSH agent inside the container where Git operations will take place. All of this requires the container image to contain openssh-clients package. Mounting your key file to that container will allow it to read the key and use it for authenticating towards your repo.

The reason behind using init containers is that they are executed sequentially (one after another), executed only once, and should one of them fail the pod deployment will stop immediately. This is exactly what is needed for the execution of the update tasks.

The manifest itself is uploaded in JSON format to make it easier to change programmatically using jq.

In this case, I am filtering out namespaces that are not named “build-something” only due to policy, which dictates that Pipelines should run in namespaces designated for build alone (because, in this particular case, Pipelines are essentially build Pipelines only). This might not be necessary in other cases, so you might want to drop the filtering process and grab them all.

Some more auth…

We are not done quite yet. Since our CronJob will now list namespaces and change itself, we need to grant the ServiceAccount used to run this CronJob permissions to do those things. We achieve this by creating a separate ClusterRole and ClusterRoleBinding resources:

ClusterRole

ClusterRoleBinding

That should do the trick. Now your CronJob should run without issues.

To summarise this how-to, we created two CronJobs — one by using the Operator, and one with its help only. We also deployed an Argo CD application that will now ensure the CronJob will not be removed. Then we added permissions to all of the components in this puzzle necessary for them to function. Now, our custom pruner should run alongside the Operator-generated one, such that Pipeline resources will be cleaned up based on both time and count.

Let me know if the solution worked for you, or if you have any comments.