Streamlining CI/CD and ETL pipelines with Jenkins— part 1

Mike Palei
5 min readNov 26, 2023

--

The dynamic realm of DevOps/MLOps hosts three distinct factions: those who bear a grudge against Jenkins, those who can’t abide Airflow, and those, like me, who hold no discrimination and harbor disdain for both. However, in the immortal words of a famous bard, “Love attention is all you need.” Embracing this notion, I extend an invitation to embark on a journey with me, where we shall uncover the advantages of deploying organization pipelines in Jenkins. Together, we’ll explore the features that render it a formidable choice for streamlining CI/CD and ETL workflows.

Our journey commences in the prehistoric era (a mere couple of years ago, to be precise) when most of the logic was implemented in hardcoded bash scripts executed by Jenkins. Gradually, my team and I transitioned to using pipelines, but they remained hardcoded within Jenkins, with all triggering mechanisms configured through the GUI. We found ourselves still unredeemed, with a long walk of atonement ahead. It was during this critical juncture that we encountered the words of the gospel:

“Orchestration must be treated as code”

A simple yet powerful sentence indeed. Why is it that our business logic undergoes meticulous scrutiny with unit testing and pull requests, while the very component executing it remains vulnerable? Anyone can come and modify the orchestration logic at their discretion. It’s crucial to emphasize that we’re discussing production pipelines here. The potential risks of such unchecked modifications become all too apparent.

“One small step for man, one giant leap for mankind.”

So, rather than having a pipeline hardcoded as shown below

we transitioned to a more dynamic approach as illustrated in:

Yet, we sensed that something crucial was still absent, prompting us to question our objectives. What did we genuinely desire to achieve? It became apparent that we had two predominant scenarios: CI/CD (to be covered in detail in post 2 of the series) and ETL (to be covered in detail in post 3 of the series). Consequently, we compiled a list of essential functionalities:

  1. Automatically recognize projects with a matching Jenkinsfile name pattern.
  2. Discover and close pull requests in these projects.
  3. Automatically execute relevant portions of the pipeline upon push to branch, pull request creation or merging into the master/main branch.

That is exactly where Jenkins Organization Folders come into play.

So how do you define one:

Defining Organization Folder

You’ll need to configure the following (see images below):

  • access to SCM (git)
  • behaviors
  • project discovery
Access to SCM
Behaviors
Project discovery

The last picture means that all repositories inside the organization specified in the SCM access part that have Jenkinsfile.ci in them(the .ci suffix is just our naming convention, you can obviously use a different one) will be now visible to Jenkins.

Once you’ve finished configuration the organization will be scanned and you will see something like this (the status icons next to project names obviously will be different):

Projects in Organization folder

Now every project having Jenkinsfile.ci (or some other name following your convention) will be discovered. Jenkins automatically scans your organization once in every while, however if you need to rescan you can do it manually:

Now, let’s decide how the jobs within the projects will be triggered. Our main aim is to minimize GUI usage, ideally using it only for experimentation and debugging, not for production. There are essentially two options:

  • a pipeline triggered by a webhook on every push to Git.
  • a pipeline triggered by a cron schedule, with the triggering mechanism written directly inside the pipeline (for detailed insights, please refer to the second post in this series).

For the scope of this post, I’ll delve into the first option. To implement this, ensure your Jenkins instance has the GitHub Integration Plugin installed. Additionally, configure a webhook in Git and set up Jenkins to collaborate with Git, following guidelines such as those detailed in this article.

Here’s an example of what you might have in your Jenkinsfile.ci. This pipeline will use a buildDockerImages defined in a shared library. It will be triggered on every push to git, scan the project, build all Dockerfiles it findds inside the project and push them to dockerhub (public or private)

#!/usr/bin/env groovy
@Library('jenkins_shared_libs') _
properties([
buildDiscarder(
logRotator(
numToKeepStr: '10'
)
),
pipelineTriggers(
[
githubPush()
]
),
disableConcurrentBuilds(),
])
buildDockerImages()

Now, if you open some project in your Jenkins organization folder (which, I repeat, will be discovered if it has a file with a matching name pattern) you will see something like:

In the next post of this series we shall familiarize ourselves with the concept of shared libraries — what at they used for and how they are set up — and take a look at a real pipeline that shows:

  • how triggering is defined within the pipeline code itself
  • how execution of certain pipeline stages may be conditioned (e.g. on the pipeline being triggered by cron and the branch being master)
  • how functions from shared libraries are used

--

--