Intro to Kestra: Open-Source Orchestration and Scheduling Platform

Published in

Geek Culture

5 min readOct 27, 2022

Companies build data pipelines to prepare data ingredients, extract insights, and distribute findings across internal and external parties. The ability to handle various and massive data has become a critical factor to drive a successful business. Building and managing data flow however is not simple. You have to consider schedules and plans to extract data from disparate sources. When you finally persist raw or transformed data into a data warehouse, you have to factor in transformation, modelization, and aggregation. It can easily become a complicated task.

To reliably operate data orchestration and scheduling, you should be well aware of the fact that businesses need a flexible and easy-to-use platform. Let’s try it out and learn more about Kestra — an open-source data orchestrator — to discover how and if it can streamline data flow and development processes. To find out the key benefits and features!

Two main concepts of Kestra

The building blocks of Kestra are straightforwardly designed, which helps new users quickly grasp the fundamental concepts of the platform and the ways the software works. Kestra has two large concepts and each consists of smaller components.

Flow

A flow is a simple list of tasks grouped by namespace. It stores all the actions that occur in the current flow. A flow contains the following components.

Task: A task is an action in a flow. A task can take inputs, execute a job, or generate an output. There are two types of tasks in Kestra.
Flowable Task: A flowable task handles workflow states and starts new tasks. They are responsible for the logic of the flow, which allows you to build complex workflows such as branching and parallel tasks.
Runnable Task: A runnable task is an actual computational job. It can be any type of jobs such as running a script, API calls, file system operations, database queries, and many more.
Namespace: A namespace is similar to a folder of file systems. It is used to define flows in a hierarchy structure.
Input: Inputs are parameters that are sent to a flow. You can pass various parameters in a string, an integer, a file, or more.
Revision: When you modify, it will generate a new revision. It is an incremental number that gets updated after each change.
Listeners: Listeners are a type of task that can listen to the current flow and can execute tasks outside the current flow.
Triggers: Triggers work as a starter of a flow from external events.
Templates: Templates are lists of tasks that can be referred to across different flows.

Execution

The following high-level concept is execution. An execution is a flow that is being processed or has already been completed.

Task Run: It is a task that is running or has been run and linked to its state (outputs see below).
Attempts: A single task run can have one or more attempts. Usually, tasks run will have only one attempt. However, when a task fails, it can add retries.
Outputs: A task run can produce output data that can be passed onto other tasks.
Metrics: A task can create metrics data that can be used to track task status.
State: A state is used to define the status of a task run or execution.

Flexible flow definition

In Kestra, a flow can be defined in a YAML file that can be used to reproduce a flow in a different environment. Since YAML is a common file format for Infrastructure-as-a-Code services and it is straightforward to use. This makes a big difference because you can source-control all changes in a flow and enable team collaboration in building flows. Using Kestra’s flow definition in YAML, you can control it and apply different logic.

Sequential: you can run tasks in order.
Parallel: you can execute tasks simultaneously.
Loop: you can repeat a task multiple times based on the outputs of previous tasks.
Switch: you can branch out of the flow based on the current state.
Pause: you can pause after a task.
Trigger: you can start a flow based on events or others’ flow end.

Extensive Kestra plugins

Kestra plugins make it highly versatile and allow you to interact with diverse systems, databases, services, and software. This is one of the most essential factors when you evaluate job orchestration tools because orchestration software is not built to perform, for example, heavy data processing or compute-intensive jobs on its own. Instead, it allocates such jobs to external resources.

When you want to transform data and persist it into storage, you need plugins that can connect and run queries in relational databases, NoSQL, data warehouses, or even cloud file systems. Kestra has wide coverage in that domain including all popular relational and non-relational databases. Plus, it can communicate with popular data warehouse solutions such as Redshift, Snowflake, and Big Query and cloud file storage like AWS S3, GCP, Google Drive, and many more.

If you want to run orchestrated DevOps tasks, you can do that too. You can launch Docker images or control Kubernetes with customized code in Python, Bash, and Node for advanced pipelines. You can access other modern stacks such as dbt, Soda, Singer, and Debezium. On top of all their supported plugins, you can build your own plugins!

Intuitive user interface

Kestra comes with an intuitive and full-fledged user interface. On the web-based UI, users can edit configurations, run flows, and monitor all the historical and current executions in real-time.

Easy DevOps

Kestra supports Terraform integration. Using it, you can deploy the flow and reproduce the same flow across different environments such as dev, UAT, stage, or production. The ability to define flows, tasks, and the logic of flows in YAML enables scalable teamwork among members with CI/CD deployment.

Conclusion

We discovered major concepts and features of Kestra. Large enterprises including Tencent, BMW, Huawei, Leroy Merlin, and more use the platform. Kestra is open-source software so you can start trying it out and research to check if the orchestration tool fits your use cases without worrying about a trial period. Another advantage of using an open-source tool is that it can be implemented on any cloud service, on-premise, or even on your local disk for testing. Kestra has a containerized image for an easy start.

If you want to try, check the Kestra Github repository :

GitHub - kestra-io/kestra: Kestra is an infinitely scalable orchestration and scheduling platform…

Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring…

github.com