Kubernetes and Worker Queues
A couple months ago my friends from Eclipse OSS Review Toolkit(ORT) reached out for me, for some guidance regarding Kubernetes. ORT allows you to setup a toolchain to manage OSS dependencies in your source code and avoid OSS licensing conflicts early on if you integrate it as part of your CI/CD toolchain.
They wanted to implement a on demand scan service. Basically run a OSS scan from a trigger. In the Java ecosystem, ORT is mainly written in Kotlin, the paradigm is often to solve challenges within the JVM process. So you would go with a job scheduler like to Quartz, or a newer tool like JobRunr to schedule async job executions. This can lead to applications in the Java Ecosystem becoming quite overloaded with functionality and frameworks. Also configuring a distributed job queue technology in Java can be challenging regarding state and reliability. But we can easily differentiate multiple different concerns in regards to a web queue worker setup:
- there is an client that initiates a job
- there is a controller that schedules and tracks the execution of the job
- there is an executor that executes the job
Since they are running an OSS project, they would like to remain cloud vendor neutral, a common denominator though was Kubernetes since it’s widely available nowadays. For this blog post, I am going to use Google Cloud Platform services (GCP), but the solution approach should work on all Kubernetes (K8S) distribution.
If you want to skip ahead, you can head over to the ORT repo to checkout the source code.
So I decided to make a simple POC to show how this could be realised on K8S. Since I wanted to leverage the cloud native ecosystem, I didn’t have to reinvent the wheel here but rather could select a good implementation for the problem. My decision fell on Tekton, a cloud-native CI/CD framework. Tekton is build around several K8S Custom Resource Definitions (CRDs) with controller implementations. This means not extra database is needed for Tekton to work, since all state is held in the K8S cluster.
Tekton can easily be setup by following the installation instructions. You just need to call kubectl apply --filename https://storage.googleapis.com/tekton-release/pipeline/latest/release.yaml
. This will install the CRDs and Controllers for Tekton. Tekton is an extensible framework and you can create and install your own task types, for the ORT example, you will need to install the git-clone and gcs-upload tasks.
After this, we can define our pipeline (see ort-pipeline.yaml as an example). A pipeline defines the steps that should be run, and which output and input workspaces (think file folders), the pipeline needs to store and read data. Once the pipeline is defined, you can simply run it by creating a PipelineRun
resource. ( e.g. simple-maven.yaml).
Since calling kubectl apply
every time we want to execute a pipeline is a little cumbersome and not really production ready, Tekton also provides Triggers and EventListeners, with them you can setup the common patterns like Cron-based triggers, Webhook-based triggers for Pull Requests and Git commits or even Cloud-Event based listeners.
In my opinion, if you are looking for a vendor neutral K8S based CI/CD solution, you should give Tekton a try. But there are many more use cases where it can be useful outside of CI/CD scenarios, since it basically allows you to orchestrate tasks based on triggers, a use case that many applications have. And it provides some more features when it comes to multi-step workflows compared to K8S Jobs. Of course if you want to do workflow orchestration and are happy to go with a fully managed solution you could give Google Cloud Workflows a try.