The Fastest Way to Build Data Pipelines?
A guide to efficient data engineering using Kestra blueprints to kickstart your next workflow
Let’s face it — learning something new isn’t as straightforward as often portrayed online. It takes time and dedication, and these two have become unicorn-rare with the increased job complexity.
The best thing a company can do is to show you exactly how to use their products. Some have documentation, others opt for interactive tutorials, and some simplify things even further with deliberate and reusable code examples — or blueprints, for short.
That’s exactly the direction Kestra went with. They provide you with more than 100 blueprints you can use out of the box or with slight modifications. Of course, we won’t cover all of them today, but you’ll get the general idea after seeing three of them.
Let’s dig in!
What are Kestra Blueprints and Why Should You Care?
Kestra’s 0.10.0 release, among other things, introduces blueprints. Think of these as a catalog of ready-to-use code examples designed to help you learn the tool in the shortest time frame possible, and also drastically reduce the time needed to develop the workflow.
Some blueprints work out of the box, while the others will need a bit of tweaking. That’s understandable since the tweakable ones only require you to change database/server connection strings, or other API credentials.
Once you install Kestra locally, you’ll be able to access this catalog of blueprints from the left navigation sidebar.
You’ll be able to filter them based on various tags, such as programming language or technology. It’s likely you’ll find what you’re looking for, but if you don’t, feel free to suggest a new one via the following issue template.
You now know what Kestra blueprints are and why they’re useful — but how do you actually go about using them? That’s what we’ll go over next.
Blueprint Example 1: How to Install Python Packages Before Running a Python Script
A lot of new data orchestration platform users come from a Python background. It makes sense to go over and explain exactly how Kestra handles Python and additional Python dependencies.
This blueprint is very simple, as you can see from the image below:
It only has one Python script task that uses the requests
module to make an API request.
The problem is, this library doesn’t ship with the default Python installation, so you’ll have to install it manually. This blueprint instructs you that the correct way to approach this is by using the beforeCommands
command:
You can typically find every explanation you’ll need in the About section below the source code.
Now scroll to the top of the blueprint and hit the big purple “Use” button. it will copy the blueprint code into a new Kestra flow:
All this Python snippet does is make a request to the GitHub API and prints the response.
From here, you can change the code if needed, or just save and run it:
This blueprint required no modification, since it’s not communicating with your servers/databases, nor does it require any authentication key.
For that reason, the flow execution succeeded, which was indicated by a green bar in the Gantt view. You can also switch to the Logs tab to see the output:
And that’s it — your first blueprint with Kestra! Let’s see what else is available.
Blueprint Example 2: Run R Script in a Docker Container and Generate Downloadable Files
Kestra is not a Python-specific data orchestration platform. It can run other programming languages as well, such as R.
To demonstrate, let’s go over to the R blueprint tag and click on the only blueprint currently available:
In a nutshell, this one will create a working directory, modify a dataset that’s built into R, and save it both in CSV and Parquet file formats. In addition, this blueprint also show you how to make these files downloadable from Kestra UI, but more on that in a bit.
Once again, you can scroll down to the About section for further clarifications:
As with the previous blueprint, click on the “Use” button to create a new flow from it. This one also doesn’t communicate with any servers or databases, so there’s no need to change connection parameters or credentials:
You can now save the Kestra flow and run it. It has a single working directory task that has two tasks in itself, as you can see from the Gantt view:
The Kestra flow execution was successful, which means we should be able to download the CSV/Parquet files from the UI.
These are available in the Outputs tab:
You can either download the file or preview it using the Kestra UI— this is what it contains:
Let’s explore a slightly more complex blueprint next.
Blueprint Example 3: Load a CSV File into a Postgres Table
We’ll now take a look into a blueprint that requires some modification before can use it. To be more precise, the blueprint will show you how to load a downloaded CSV file into a Postgres database:
The blueprint shows you how to first download a CSV file from GitHub, then how to create a table in Postgres, and finally how to copy the CSV file and read from the table.
It’s a decent chunk of code, but the logic is fairly straightforward:
To prove a point, let’s use the blueprint as-is and run it. You already know how to do that.
You’ll see an error in the create_table
task because Kestra can’t communicate with the Postgres database listed in the flow code:
Let’s work on fixing it next.
Blueprints are Just That — Blueprints
It’s important you understand that blueprints are oftentimes not meant to be used straight out of the box. You’ll need to modify code portions that include API keys and connection strings.
The good part — Kestra uses Postgres behind the scenes, which means you can copy their connection string and credentials and paste them into the flow code:
Keep in mind there are two additional places in the flow code where you’ll need to replace the credentials.
After replacing them, Kestra shouldn’t have any issues communicating with Postgres database running in a Docker container:
As per the Gantt chart, it looks like everything finished successfully, but we can’t know for sure before taking some additional steps.
How to Test If Data was Saved to Postgres
You now have a Postgres database running in a Docker container. This means you can access its tables by first logging into the container shell, and then logging into Postgres.
The first step is to access the container shell. Make sure you know what the container is called, and then run the following command:
docker exec -it <container-name> bash
You should see something similar on your end:
Once here, connect to the local Postgres database with the kestra
user. The database we’ll use is also called kestra
.
psql -h localhost -U kestra -d kestra -p 5432
You should see the following screen:
And from here, simply run an SQL command to select records from the table:
SELECT * FROM country_referential;
Here’s the output:
It looks like the data was saved successfully.
This last blueprint example involved more manual work, but that’s something to get used to. They’re only blueprints after all, and not full code snippets tailored to your specific needs.
Kestra Blueprints — Final Words
And there you have it — three concrete examples of Kestra blueprints. You have to admit that these simplify the learning process and also show you the optimal way of writing and organizing your flows.
We recommend going through more of them to get an idea of how Kestra works, and why some things are organized in a certain way.
You might also find the following articles useful:
- Airflow vs. Prefect vs. Kestra — What is The Best Data Orchestration Platform in 2023?
- Airflow vs. Prefect vs. Kestra — Which is Best for Building Advanced Data Pipelines?
If you opt for the Enterprise Edition of Kestra, you’ll be able to create internal blueprints. Their job is to help you share commonly used workflows and reusable components across your organization.
Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.