The Fastest Way to Build Data Pipelines?

A guide to efficient data engineering using Kestra blueprints to kickstart your next workflow

Dario Radečić
Geek Culture
8 min readAug 30, 2023

--

Photo by Sigmund on Unsplash

Let’s face it — learning something new isn’t as straightforward as often portrayed online. It takes time and dedication, and these two have become unicorn-rare with the increased job complexity.

The best thing a company can do is to show you exactly how to use their products. Some have documentation, others opt for interactive tutorials, and some simplify things even further with deliberate and reusable code examples — or blueprints, for short.

That’s exactly the direction Kestra went with. They provide you with more than 100 blueprints you can use out of the box or with slight modifications. Of course, we won’t cover all of them today, but you’ll get the general idea after seeing three of them.

Let’s dig in!

What are Kestra Blueprints and Why Should You Care?

Kestra’s 0.10.0 release, among other things, introduces blueprints. Think of these as a catalog of ready-to-use code examples designed to help you learn the tool in the shortest time frame possible, and also drastically reduce the time needed to develop the workflow.

Image 1 — Kestra blueprints page (image by author)

Some blueprints work out of the box, while the others will need a bit of tweaking. That’s understandable since the tweakable ones only require you to change database/server connection strings, or other API credentials.

Once you install Kestra locally, you’ll be able to access this catalog of blueprints from the left navigation sidebar.

You’ll be able to filter them based on various tags, such as programming language or technology. It’s likely you’ll find what you’re looking for, but if you don’t, feel free to suggest a new one via the following issue template.

You now know what Kestra blueprints are and why they’re useful — but how do you actually go about using them? That’s what we’ll go over next.

Blueprint Example 1: How to Install Python Packages Before Running a Python Script

A lot of new data orchestration platform users come from a Python background. It makes sense to go over and explain exactly how Kestra handles Python and additional Python dependencies.

This blueprint is very simple, as you can see from the image below:

Image 2 — Example blueprint contents (image by author)

It only has one Python script task that uses the requests module to make an API request.

The problem is, this library doesn’t ship with the default Python installation, so you’ll have to install it manually. This blueprint instructs you that the correct way to approach this is by using the beforeCommands command:

Image 3 — Blueprint description (image by author)

You can typically find every explanation you’ll need in the About section below the source code.

Now scroll to the top of the blueprint and hit the big purple “Use” button. it will copy the blueprint code into a new Kestra flow:

Image 4 — Using the blueprint (image by author)

All this Python snippet does is make a request to the GitHub API and prints the response.

From here, you can change the code if needed, or just save and run it:

Image 5 — Running the blueprint and examining task flow run (image by author)

This blueprint required no modification, since it’s not communicating with your servers/databases, nor does it require any authentication key.

For that reason, the flow execution succeeded, which was indicated by a green bar in the Gantt view. You can also switch to the Logs tab to see the output:

Image 6 — Blueprint output logs (image by author)

And that’s it — your first blueprint with Kestra! Let’s see what else is available.

Blueprint Example 2: Run R Script in a Docker Container and Generate Downloadable Files

Kestra is not a Python-specific data orchestration platform. It can run other programming languages as well, such as R.

To demonstrate, let’s go over to the R blueprint tag and click on the only blueprint currently available:

Image 7 — Available R blueprints (image by author)

In a nutshell, this one will create a working directory, modify a dataset that’s built into R, and save it both in CSV and Parquet file formats. In addition, this blueprint also show you how to make these files downloadable from Kestra UI, but more on that in a bit.

Image 8 — Flow of the R Docker blueprint (image by author)

Once again, you can scroll down to the About section for further clarifications:

Image 9 — R blueprint description (image by author)

As with the previous blueprint, click on the “Use” button to create a new flow from it. This one also doesn’t communicate with any servers or databases, so there’s no need to change connection parameters or credentials:

Image 10 — R blueprint code (image by author)

You can now save the Kestra flow and run it. It has a single working directory task that has two tasks in itself, as you can see from the Gantt view:

Image 11 — R blueprint execution flow (image by author)

The Kestra flow execution was successful, which means we should be able to download the CSV/Parquet files from the UI.

These are available in the Outputs tab:

Image 12 — Resulting downloadable files (image by author)

You can either download the file or preview it using the Kestra UI— this is what it contains:

Image 13 — Contents of the downloaded file (image by author)

Let’s explore a slightly more complex blueprint next.

Blueprint Example 3: Load a CSV File into a Postgres Table

We’ll now take a look into a blueprint that requires some modification before can use it. To be more precise, the blueprint will show you how to load a downloaded CSV file into a Postgres database:

Image 14 — Execution flow of the Postgres blueprint (image by author)

The blueprint shows you how to first download a CSV file from GitHub, then how to create a table in Postgres, and finally how to copy the CSV file and read from the table.

It’s a decent chunk of code, but the logic is fairly straightforward:

Image 15 — Flow code for the Postgres blueprint (image by author)

To prove a point, let’s use the blueprint as-is and run it. You already know how to do that.

You’ll see an error in the create_table task because Kestra can’t communicate with the Postgres database listed in the flow code:

Image 16 — Postgres error (image by author)

Let’s work on fixing it next.

Blueprints are Just That — Blueprints

It’s important you understand that blueprints are oftentimes not meant to be used straight out of the box. You’ll need to modify code portions that include API keys and connection strings.

The good part — Kestra uses Postgres behind the scenes, which means you can copy their connection string and credentials and paste them into the flow code:

Image 17 — Postgres configuration changes (image by author)

Keep in mind there are two additional places in the flow code where you’ll need to replace the credentials.

After replacing them, Kestra shouldn’t have any issues communicating with Postgres database running in a Docker container:

Image 18 — Running the flow again (image by author)

As per the Gantt chart, it looks like everything finished successfully, but we can’t know for sure before taking some additional steps.

How to Test If Data was Saved to Postgres

You now have a Postgres database running in a Docker container. This means you can access its tables by first logging into the container shell, and then logging into Postgres.

The first step is to access the container shell. Make sure you know what the container is called, and then run the following command:

docker exec -it <container-name> bash

You should see something similar on your end:

Image 19 — Bash connection to a Docker container (image by author)

Once here, connect to the local Postgres database with the kestra user. The database we’ll use is also called kestra.

psql -h localhost -U kestra -d kestra -p 5432

You should see the following screen:

Image 20 — Logging into the Postgres database (image by author)

And from here, simply run an SQL command to select records from the table:

SELECT * FROM country_referential;

Here’s the output:

Image 21 — Querying the Postgres database (image by author)

It looks like the data was saved successfully.

This last blueprint example involved more manual work, but that’s something to get used to. They’re only blueprints after all, and not full code snippets tailored to your specific needs.

Kestra Blueprints — Final Words

And there you have it — three concrete examples of Kestra blueprints. You have to admit that these simplify the learning process and also show you the optimal way of writing and organizing your flows.

We recommend going through more of them to get an idea of how Kestra works, and why some things are organized in a certain way.

You might also find the following articles useful:

If you opt for the Enterprise Edition of Kestra, you’ll be able to create internal blueprints. Their job is to help you share commonly used workflows and reusable components across your organization.

Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.

--

--