Efficient Data Transformation: Using DBT via Jenkins and Docker

Krishna Kumar
BI3 Technologies
Published in
6 min readMay 30, 2024

Introduction:

Transforming data is easier than ever with dbt, Jenkins, GitHub, and Docker working together. Learn how to make your data processes smoother and more efficient. Follow these simple steps to run dbt in Jenkins using Docker containers.

Step 1: Obtaining the Jenkins Docker Image

Acquire the Jenkins Docker image by executing the following command to fetch the latest version from the Docker Hub repository.

docker pull jenkins/Jenkins

Step 2: Setting up Python Environment and Installing Required Packages

Compose a Dockerfile within the project repository.

FROM jenkins/jenkins:latest

USER root

# Install sudo
RUN apt-get update && apt-get install -y sudo

# Install necessary packages for Python and virtual environment setup
RUN sudo apt install -y python3-requests
RUN sudo apt-get install -y python3 python3-pip
RUN sudo apt-get install -y libffi-dev
RUN sudo apt-get install -y python3-venv

# Create a directory for the virtual environment
RUN mkdir /venv

# Create and activate the virtual environment
RUN python3 -m venv /venv

# Upgrade pip and install required Python packages in the virtual environment
RUN /venv/bin/pip install --upgrade pip
RUN /venv/bin/pip install dbt_snowflake==1.4.4
RUN /venv/bin/pip install dbt-core==1.4.4

# Switch back to the Jenkins user for security reasons
USER jenkins

Step 3: Constructing the Docker Image and Establishing a Container

To build the Docker image and run the container with port mapping, execute the following commands:

# Build the Docker image
docker build -t my_jenkins_dbt_image.
# Run the container
docker run -d -p 8080:8080 --name my_jenkins_dbt_container my_jenkins_dbt_image

# "docker build" - To create an image.
# "docker run" - To start (existing or new) container.
# "my_jenkins_dbt_container" - Container's name.
# "my_jenkins_dbt_image" - image's name.

Step 4: Accessing Docker GUI, Logging into Jenkins, and Obtaining InitialAdminPassword

Docker Desktop

On Windows, the status of running containers can be visualized using Docker’s graphical user interface (GUI).

Jenkins Initial Page

To access the Jenkins login page, open a web browser and navigate to localhost:8080, ensuring that the Jenkins container is currently running.

When clicking on the Jenkins container, the initialAdminPassword can be found in the log tab.

To see the Jenkins initial password in the Docker container

Step 5: Configuring Jenkins Plugin Installation and Admin User Creation

a. Choose the “Install suggested plugins” option to automatically install the recommended plugins for Jenkins.

b. Once the plugins are installed, users will be prompted to create an admin user. Enter the desired username, password, and other details.

c. Click “Save and Continue” to complete the Jenkins setup. And “Start using Jenkins” to access the Jenkins dashboard.

Jenkins Setup

Step 6: Generating an SSH Key for GitHub Connection

To bring the codebase into a private GitHub repository, generate an SSH key for establishing a secure connection. Follow the upcoming steps to generate the SSH key:

In the terminal or command prompt, running the command below will generate key files.

ssh-keygen -t ed25519 -C your_email@example.com

Note: Use the GitHub email address for key generation.

In File Explorer, .ssh folder containing two generated files can be found.

SSH Key Files

Open the “id_ed25519” file in Notepad, where the SSH key can be found. Copy it from there.

Navigate the SSH Key Menu

Add your SSH Key and save the changes to enable secure authentication with GitHub repositories using SSH.

Git SSH Key Generation

Step 7: Configure Jenkins to connect with GitHub.

a. Log in to Jenkins with the provided username and password.

b. Go to “Manage Jenkins” on the dashboard.

c. Select “System” settings from the menu and navigate to the “Git Plugin” section.

d. Install the Git plugin if not already installed and enter the Git email and username.

Setting up the Git connection

e. In the “Security” settings, locate “Git Host Key Verification Configuration” and if connectivity issues arise, switch the setting to “Accept first connection”.

Git Host Key Settings in Jenkins

f. Save the changes to apply the updated configuration.

Step 8: Establishing a Pipeline Job in Jenkins.

In Jenkins, navigate to “New Item” select “Pipeline” as the project type, and specify a name for the project.

Jenkins Dashboard
Creating the new pipeline project

If intending to execute the job on a schedule, opt for the “Build periodically” option and insert the cron expression as needed.

Note: A cron string is a concise format used to schedule recurring tasks in Unix-like systems, consisting of five fields representing different time units.

Ref: Crontab.guru — The cron schedule expression generator

This cron string represents the schedule that runs every day at midnight (00:00)

Before creating pipeline script, make sure to add credentials for SSH key connection.

a. Navigate Manage Jenkins -> Select Credentials -> Choose Global domain->Add Credentials Button.

b. Choose the credential type as “SSH Username with private key”.

Jenkins Credential Manager

c. Open the ‘id_ed25519.pub’ file in the ‘.ssh’ folder to copy the private SSH key.

d. Provide the SSH username associated with the Git account or the system you’re connecting to and paste the corresponding private key.

e. Once all necessary details are entered, save the credential.

f. Jenkins will securely store this information for SSH authentication in Jenkins jobs and pipelines.

Add credentials with a private SSH key

Note: The identifier (ID) mentioned here can be used as the name in the pipeline script.

Step 9: Integrating the Pipeline Script into the Job Configuration

After creating the Jenkins pipeline project,

Go to configuration page ->click advanced project options -> Pipeline script.

Jenkins Build

Technologies Utilized:

  1. Jenkins: A popular tool for automating tasks, like building and deploying software.
  2. DBT (Data Build Tool): A tool designed for organizing and transforming data, making it easier for data engineers to create, test, and use analytics models.
  3. Docker: A platform that packages software and its dependencies into containers, making it easier to deploy and run applications consistently.
  4. GitHub: A website where people can store and share their code, making it easy to collaborate on software projects.

Conclusion:

The seamless integration of dbt, Jenkins, GitHub, and Docker offers a robust solution for streamlining data engineering workflows. By automating tasks, ensuring version control, promoting collaboration, and enhancing security, this integrated approach optimizes data transformation processes and drives business value. With Docker containers providing consistency, Jenkins managing automation, GitHub enabling version control, and SSH key authentication ensuring security, organizations can efficiently leverage data analytics to make informed decisions and achieve their objectives effectively.

About Us:
Bi3 has been recognized for being one of the fastest-growing companies in Australia. Our team has delivered substantial and complex projects for some of the largest organizations around the globe and we’re quickly building a brand that is well-known for superior delivery.

Website: https://bi3technologies.com/

Follow us on,
LinkedIn:
https://www.linkedin.com/company/bi3technologies
Instagram:
https://www.instagram.com/bi3technologies/
Twitter:
https://twitter.com/Bi3Technologies
Personal LinkedIn:
https://www.linkedin.com/in/krishna-kumar-b8a30a210

--

--