Canvas — A new take on Workflow Design

Avish Vijay
Peak
Published in
4 min readJan 10, 2020

When we (The Peak Product Team) were first challenged with the revamping of Workflow feature in Peak’s AI System, the first thought that came to our mind was of designing a conventional canvas on which data science could make their workflow.

Don’t go with the first idea that comes to your mind.

Introduction

Before going into complications, let me set the context. Peak’s AI System aspires to be the central system of intelligence for the business which can push AI Solutions in an automated manner. These solutions could be an API, Predictive Metric Dashboard or Actionable feature like sending out a Campaign to AI-generated Customer Segments.

Where workflow fits in the picture

In a crisp manner, Workflow is an end to end machine learning pipeline that is complex and scalable at the same time to solve machine learning use-case.
The pipeline should be able to achieve the entire journey of going from Raw to Predictive/Prescriptive Data.

Customer Operations team build workflows such that these solutions could be pushed regularly at a particular trigger which can be a scheduler, API or an event.

History

We had a very simplified version of workflows where Data Scientist could build their workflow but with a lot of constraints, linear only, decentralized jobs to mention a few of them.

We figured out that we need a more powerful workflow builder suited to building AI Solutions.

The foundation of our Solution

Open Source Project Argo Workflows solves the use cases that we required and was the right fit for becoming the foundation of our solution. We spent an ample amount of time doing a Proof of Concept for just that.

Argo provides an open-source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD (Custom Resource Definition). (ref: https://argoproj.github.io/argo/)

First Idea — Conventional Canvas

We ran an analysis of different workflow builder used, to eventually found it, even if they are visually appealing, they are pretty hard to use and learn. The reason for the same being that it breaks the primary workflow building step — It is not a painting canvas, workflows are planned first and then built as they are driving business at a bigger level.

Cut the showoff

We know you have millions of type of triggers and steps to make the workflow, but hey, I already know that I need a certain type of it. Why overwhelm users and waste precious screen estate. We decided against it and simplified the canvas where all the widgets are added to through the Floating Action Button at the bottom right.

Using Custom Docker Images

Machine Learning projects needs a different kind of system environment in different phases and also demands stability and scalability once deployed in production. We allowed data scientists to build and use their own docker images where user can define the environment that has everything in it needed to run — code, tools, and resources.

Select your own Infrastructure

As usually, data cleaning jobs require fewer resources than the Machine learning job. Same as the above use case of Images, every step requires its own instance and the data scientist should be allowed to scale their resources to the requirements of the workflow. Instance types comprise varying combinations of CPU, memory, storage, and networking capacity and give the flexibility to choose the appropriate mix of resources for the workflows. As usually, data cleaning jobs require fewer resources than the Machine learning job.

Don’t overdo connector lines

We observed that conventional canvas gave the super flexibility of dragging or moving connector lines, and it again breaks the same rule. Mostly, what step will start next is already coming down from a business login and these connector lines just add an extra complication to the user and increase the chances of slip especially when it comes to changing things in the middle of the workflow i.e dependent steps.

Cloning the step

In Machine Learning Workflows, often a step is repeated multiple times and our analysis found it that not much of the workflow builder gave an option of cloning a step, the poor user has to refill all the redundant information again. Cloning the step saves time and effort and gives an edge in operational efficiency.

Conclusion

Machine Learning workflow plays an important role in improving efficiency for Data Science/Implementation team. It is an essential step to deploy customer solutions in production. We have amended the concept of the conventional canvas and workflows to fulfil Peak’s business needs.

--

--