Customizing Elyra pipelines
The Elyra pipeline editor in JupyterLab provides users with the means to assemble pipelines from Jupyter notebooks, Python scripts, R scripts and pre-packaged code using a GUI, without the need to write code.
You can run these pipelines from the pipeline editor or the command line interface in remote environments, where Kubeflow Pipelines v1.x or Apache Airflow v1.x is deployed.
When you run a pipeline, Elyra generates the required artifacts for the target runtime environment and triggers their execution. In Elyra version 3.13, we’ve extended the pipeline editor for Kubeflow Pipelines to optionally expose these artifacts as Python source code, which you can customize as desired. Customization enables you to add functionality to pipelines that the pipeline editor doesn’t natively support, as I’ll demonstrate later in the post.
The pipeline editor for Apache Airflow also supports exporting to Python source code.
In this blog post I’ll focus on customizing pipelines for Kubeflow Pipelines, but many of the outlined steps also apply to Apache Airflow.
Exporting a pipeline
You can export a pipeline to Python source code (or other formats supported by the runtime environment) using the pipeline editor’s export wizard or the command line interface.
Exporting a pipeline using the export wizard
To launch the wizard, open the pipeline in the pipeline editor and click the “Export Pipeline” button.
Export requires two selections. The first selection is the runtime configuration, which contains information about the target runtime environment. The second selection is the export format.
For Kubeflow Pipelines v1.x you can choose from two export formats:
- The Python domain specific language (DSL) utilizes a set of Python packages that are provided by the Kubeflow Pipelines SDK, which are used to programmatically create and run workflows.
- The YAML-formatted static configuration file is an engine-specific workflow definition. Elyra supports Kubeflow Pipelines with Argo or Tekton. You can import this file using the Pipelines UI in the Kubeflow Central Dashboard.
Elyra does not support exporting to Kubeflow Pipelines v2.x intermediate representation (IR) YAML.
Customizing the generated Python source code
The generated source code is organized into three parts: component definitions, the pipeline definition, and a main function.
Component definitions
Utilizing component catalogs, Elyra can load component definitions from local and remote repositories, such as the local file system, web resources, Artifactory, or the Machine Learning Exchange. During export, component definitions are embedded as text, making it easier to share the code and its components dependencies.
component_def_8e4384f422a5f611d741ceb4 = """
name: Download File
description: Downloads a file from a public HTTP/S URL using a GET request.
inputs:
- {name: URL, type: String, optional: false, description: 'File URL'}
outputs:
- {name: downloaded file, type: String, description: 'Content of the downloaded file.'}
implementation:
container:
image: quay.io/elyra/kfp-tutorial-download-file-component@sha256:499c...
command: [
python3,
# Path of the program inside the container
/pipelines/component/src/download-file.py,
--file-url,
{inputValue: URL},
--downloaded-file-path,
{outputPath: downloaded file}
]
"""
factory_8e4384f422a088e4814024df79 = (
kfp.components.load_component_from_text(
component_def_8e4384f422a5f611d741ceb4
)
)
Each component definition is loaded using a factory function. The component_def_
and factory_
variable names are suffixed with an id to create unique references that are used in the pipeline definition.
Pipeline definition
In the pipeline definition, the component factory functions are used to create Kubeflow Pipelines tasks, passing the user-provided inputs as parameters.
@kfp.dsl.pipeline(name="demo")
def generated_pipeline():
# Task for node 'Download monthly data'
task_774bcb05_a32c_4c = factory_8e4384f422a088e4814024df79(
url="https://.../data.csv",
)
task_774bcb05_a32c_4c.set_display_name("Download monthly data")
task_774bcb05_a32c_4c.add_pod_label("load-data", "sales-data")
A task variable definition is followed by one or more method invocations. These methods apply properties you’ve specified in the pipeline editor, such as the custom node name and Kubernetes resources (volumes, metadata, secrets, etc).
You can customize the pipeline definition as desired. For example, by adding a few lines of code to the above code snippet, you can parametrize a task:
@kfp.dsl.pipeline(name="modified_demo")
def generated_pipeline(
customizable_url: str = "https://..."):
# Task for node 'Download monthly data'
task_774bcb05_a32c_4c = factory_8e4384f422a088e4814024df79(
url=customizable_url,
)
This change exposes the tasks’ file download URL in the Kubeflow Pipelines UI, enabling you to specify a custom value for each run, without having to modify the pipeline in the pipeline editor.
Elyra should natively support pipeline parameters for Kubeflow Pipelines in version 3.14. We’ll cover this in the next blog post.
main function
The main
function invokes the Kubeflow Pipelines compiler for Argo or Tekton, as defined in your runtime environment configuration.
if __name__ == "__main__":
kfp.compiler.Compiler().compile(
pipeline_func=generated_pipeline,
package_path=Path(__file__).with_suffix(".yaml").name,
)
You can compile the (customized) source code by executing it in in a terminal window.
~/customize/workspace $ ls
demo.pipeline demo.py
~/customize/workspace $ python demo.py
~/customize/workspace $ ls
demo.pipeline demo.py demo.yaml
The output is the workflow engine specific YAML-formatted static configuration file, which you can then upload using the Kubeflow Pipelines UI, as mentioned earlier.
Summary
Year-to-date (in 2022) the Elyra community published nine minor releases, each delivering user-requested enhancements. An easy way to stay up-to-date on new features is the releases page. If you are using a more recent Elyra version (3.11 and later), you can access this page from the JupyterLab launcher.
If you are already using Elyra feel free to reach out to us on one of our community channels and let us know what you’d like to see next.
If you’ve missed our earlier Elyra blog posts, check out the complete list on our publication page.
Thanks for reading! Until next time.