Customizing Workflow orchestrator for ML and Data pipelines

Published in

cverse-ai

5 min readDec 12, 2019

This blog has been the hardest to write for me by far. In part, the challenge stems from trying to sum up months worth of experiences is just a few paragraphs. Even more challenging, this post marks the end of my internship here with Couture.ai. To sum it up, it’s been a summer of growth.

My work here involved understanding ML workflows and create a production worthy solutions to actual. I was involved in customizing workflow orchestrator in order to make it more user-friendly to manage the data pipelines. So my first task was to understand Workflow Orchestrator and its working.

Couture Workflow Orchestrator is a platform to programmatically author, schedule and monitor workflows. It is undoubtedly one of the best workflow management systems as it makes the workflow a little bit simple and organized by allowing us to divide it into small independent (not always) task units. Easy to organize and easy to schedule. The entire workflow can be converted into a DAG (directed acyclic graph) with a workflow orchestrator. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.

My project here can be divided into three parts, all of which cater to enhance the user experience and making the interface more easy to manage the pipelines :

a.) Functionality to perform the CRUD operations in any config files from UI along with the already existing command-line method.

b.) Upload and manage DAGs and code artifacts from UI.

c.) DAG code editor that allows the user to -

i.) Edit DAG file and review the changes before saving them.

ii.) Drag and drop code snippets in the existing code from a collection of available modules, as where desired, from UI which in future can be further extended for developing Auto ML platform.

To develop an interface for managing config files from UI

Configuration files of any type can be edited from command line. But that’s too much effort, right? Finding the exact path where it is stored, opening it in an editor and then making modifications in it. Imagine if this could be done without knowing its path, directly from UI with a few clicks. Well, this is exactly what this interface does.

The option ‘jar’ if present inside the config file allows the user to see and select the jar files in the form of a **dropdown**. Pre-selected jars will already come as selected inside the dropdown.

This can be extended to Hadoop configuration files as well.

a.)Last modification time of the file.

b.)Size of the file

c.)Download option to download the files

On clicking a specific file, a page opens that shows all the options inside the file and provides all the edit and remove functionalities as shown below.

DAG Code Editor

The entire workflow can be converted into a DAG (directed acyclic graph) with the workflow orchestrator. DAG can be thought of as a container that holds tasks and their dependencies and sets the context for when and how those tasks should be executed.

Thus, it’s pretty clear that DAGs are essentially one of the most integral components of any workflow orchestrator. Hence there would be a lot of times wherein a user would want to make modifications in DAG and it would be cumbersome to search for a particular DAG inside orchestrators' DAG_FOLDER and then make those edits.

To make the user get rid of all this hassle, the DAG code editor allows the user to edit DAGs in the browser itself. And, not just edit, it also allows reviewing the code before saving it.

Let’s see how it works :

First, a page opens that lists all the DAGs along with their last modified time, size and download option.

If the user makes any edit operations in the code and clicks on the review button, the user is directed to the ‘review code’ page.

This internship gave me an opportunity to explore what holds in the future for any current development that happens today and this led to the development of providing a list of existing code snippets(tasks) to the user. This will allow him to add tasks wherever required by just a click, at the very place of the cursor.

To make editing DAG code more customizable, a user has also been allowed to add his own new task or code snippet.

One of the most important takeaway from the internship was to be dynamic, learning and implementing things on the go. Many tasks that were given to me during this internship involved using a tech stack that was completely new to me and no doubt I faced many blockers in the process. But, overcoming them was equally fun. The process instilled confidence in me to tackle challenges by myself.

I am really thankful for all my mentors who guided me in different aspects such as improving code performance, debugging skills, writing clean and modularized code. I am definitely taking a lot of learnings to build upon.

Customizing Workflow orchestrator for ML and Data pipelines

Published in cverse-ai

Written by Krati Agarwal

No responses yet