Customizing Elyra pipelines

Photograph of wildflowers in California, taken in an open space preserve
Spring in California, photo by the author

The Elyra pipeline editor in JupyterLab provides users with the means to assemble pipelines from Jupyter notebooks, Python scripts, R scripts and pre-packaged code using a GUI, without the need to write code.

Screenshot of the Visual Pipeline Editor, showing the component palette on the left and the canvas on the right.
Assembling a pipeline from Jupyter notebooks, scripts and pre-packaged components

You can run these pipelines from the pipeline editor or the command line interface in remote environments, where Kubeflow Pipelines v1.x or Apache Airflow v1.x is deployed.

Submitting a pipeline run from the editor

When you run a pipeline, Elyra generates the required artifacts for the target runtime environment and triggers their execution. In Elyra version 3.13, we’ve extended the pipeline editor for Kubeflow Pipelines to optionally expose these artifacts as Python source code, which you can customize as desired. Customization enables you to add functionality to pipelines that the pipeline editor doesn’t natively support, as I’ll demonstrate later in the post.

The pipeline editor for Apache Airflow also supports exporting to Python source code.

In this blog post I’ll focus on customizing pipelines for Kubeflow Pipelines, but many of the outlined steps also apply to Apache Airflow.

Exporting a pipeline

You can export a pipeline to Python source code (or other formats supported by the runtime environment) using the pipeline editor’s export wizard or the command line interface.

Exporting a pipeline using the export wizard

To launch the wizard, open the pipeline in the pipeline editor and click the “Export Pipeline” button.

Screenshot of the pipeline editor and the “Export Pipeline” button highlighted.
Launching the pipeline export wizard from the pipeline editor

Export requires two selections. The first selection is the runtime configuration, which contains information about the target runtime environment. The second selection is the export format.

For Kubeflow Pipelines v1.x you can choose from two export formats:

  • The Python domain specific language (DSL) utilizes a set of Python packages that are provided by the Kubeflow Pipelines SDK, which are used to programmatically create and run workflows.
  • The YAML-formatted static configuration file is an engine-specific workflow definition. Elyra supports Kubeflow Pipelines with Argo or Tekton. You can import this file using the Pipelines UI in the Kubeflow Central Dashboard.

Elyra does not support exporting to Kubeflow Pipelines v2.x intermediate representation (IR) YAML.

Customizing the generated Python source code

The generated source code is organized into three parts: component definitions, the pipeline definition, and a main function.

Component definitions

Utilizing component catalogs, Elyra can load component definitions from local and remote repositories, such as the local file system, web resources, Artifactory, or the Machine Learning Exchange. During export, component definitions are embedded as text, making it easier to share the code and its components dependencies.

component_def_8e4384f422a5f611d741ceb4 = """
name: Download File
description: Downloads a file from a public HTTP/S URL using a GET request.
inputs:
- {name: URL, type: String, optional: false, description: 'File URL'}
outputs:
- {name: downloaded file, type: String, description: 'Content of the downloaded file.'}
implementation:
container:
image: quay.io/elyra/kfp-tutorial-download-file-component@sha256:499c...
command: [
python3,
# Path of the program inside the container
/pipelines/component/src/download-file.py,
--file-url,
{inputValue: URL},
--downloaded-file-path,
{outputPath: downloaded file}
]
"""

factory_8e4384f422a088e4814024df79 = (
kfp.components.load_component_from_text(
component_def_8e4384f422a5f611d741ceb4
)
)

Each component definition is loaded using a factory function. The component_def_ and factory_ variable names are suffixed with an id to create unique references that are used in the pipeline definition.

Pipeline definition

In the pipeline definition, the component factory functions are used to create Kubeflow Pipelines tasks, passing the user-provided inputs as parameters.

@kfp.dsl.pipeline(name="demo")
def generated_pipeline():

# Task for node 'Download monthly data'
task_774bcb05_a32c_4c = factory_8e4384f422a088e4814024df79(
url="https://.../data.csv",
)

task_774bcb05_a32c_4c.set_display_name("Download monthly data")
task_774bcb05_a32c_4c.add_pod_label("load-data", "sales-data")

A task variable definition is followed by one or more method invocations. These methods apply properties you’ve specified in the pipeline editor, such as the custom node name and Kubernetes resources (volumes, metadata, secrets, etc).

You can customize the pipeline definition as desired. For example, by adding a few lines of code to the above code snippet, you can parametrize a task:

@kfp.dsl.pipeline(name="modified_demo")
def generated_pipeline(
customizable_url: str = "https://..."):

# Task for node 'Download monthly data'
task_774bcb05_a32c_4c = factory_8e4384f422a088e4814024df79(
url=customizable_url,
)

This change exposes the tasks’ file download URL in the Kubeflow Pipelines UI, enabling you to specify a custom value for each run, without having to modify the pipeline in the pipeline editor.

Screen capture of the Kubeflow Pipelines UI, highlighting ability to change task parameters in the pipeline run wizard.
Customizing pipeline run parameters in the Kubeflow Pipelines UI

Elyra should natively support pipeline parameters for Kubeflow Pipelines in version 3.14. We’ll cover this in the next blog post.

main function

The main function invokes the Kubeflow Pipelines compiler for Argo or Tekton, as defined in your runtime environment configuration.

if __name__ == "__main__":
kfp.compiler.Compiler().compile(
pipeline_func=generated_pipeline,
package_path=Path(__file__).with_suffix(".yaml").name,
)

You can compile the (customized) source code by executing it in in a terminal window.

~/customize/workspace $ ls          
demo.pipeline demo.py
~/customize/workspace $ python demo.py
~/customize/workspace $ ls
demo.pipeline demo.py demo.yaml

The output is the workflow engine specific YAML-formatted static configuration file, which you can then upload using the Kubeflow Pipelines UI, as mentioned earlier.

Summary

Year-to-date (in 2022) the Elyra community published nine minor releases, each delivering user-requested enhancements. An easy way to stay up-to-date on new features is the releases page. If you are using a more recent Elyra version (3.11 and later), you can access this page from the JupyterLab launcher.

Screen capture of the Elyra category in the JupyterLab launcher, which includes a tile that links to the release summary
Learn about new features in your installed release

If you are already using Elyra feel free to reach out to us on one of our community channels and let us know what you’d like to see next.

If you’ve missed our earlier Elyra blog posts, check out the complete list on our publication page.

Thanks for reading! Until next time.

--

--