Automating the AI Lifecycle with IBM Watson Studio Pipelines

Yair Schiff
IBM Data Science in Practice
5 min readMar 8, 2021
A cliff face with boulders on top
Photo by Gottfried Fjeldså on Unsplash

Written by Yair Schiff and Rafał Bigaj

A tragic tale from Greek mythology: Sisyphus, who had just completed his Master’s degree in Data Science, had tricked the gods one too many times. As punishment, they condemned him to an eternity of repetitive tasks. Every month Sisyphus was forced to retrain, compare, and deploy a new machine learning model on refreshed data from his company. If only he had access to IBM Watson® Studio Pipelines beta release, he could have saved himself an eternity of manually repeating AI lifecycle tasks and forever changed the course of Greek mythology.

Although not delivered in time for poor Sisyphus, the IBM Watson Machine Learning and IBM Research teams are proud to announce our latest tool in aiding clients manage the AI lifecycle, Watson Studio Pipelines. This new offering allows users to create repeatable and scheduled flows that automate notebook, data refinery, and machine learning pipelines: from data ingestion to model training, testing, and deployment. With an intuitive user interface, Watson Studio Pipelines expose all of the state-of-the-art data science tools available in Watson Studio and allows users to combine them into automation flows, creating continuous integration / continuous development pipelines for AI.

Watson Studio Pipelines are built off of Kubeflow pipelines on Tekton runtime and are fully integrated into the Watson Studio platform, allowing users to combine tools including:

  • Notebooks
  • Data refinery flows
  • AutoAI experiments
  • Web service / online deployments
  • Batch deployments
  • Import and export of Project / Space assets

into repeatable and automated pipelines.

Sign up for access to the beta release of Watson Studio Pipelines and experience the power of AI lifecycle automation for yourself.

Automation in Action

To best understand the benefits of Watson Studio Pipelines, we’ll walk through an example that highlights the most important features of this new offering.

Let’s put ourselves in the shoes of a data scientist working at a bank. To really get into the role, you can read more about the dataset used below and find instructions for download. As part of the bank’s data science team, we’ve been tasked with determining the most effective marketing techniques for incentivizing different customers to open deposit accounts.

After several weeks of diligent engineering and late nights, our model is finally ready and packaged in a notebook. The next morning while presenting our work, a colleague asks if we’ve ever used IBM Watson AutoAI. We open Watson Studio and are amazed that after a few clicks through a seamless user interface, we have another model that relies on a completely different algorithm than the one we used but produces similar accuracy. (If you’re new to AutoAI, learn more by watching our demos and try the product out for yourself with our hands on lab.)

We present both models to the marketing executives to resounding applause! Fast forward to the next month, when we receive an email asking about how the refreshed model’s accuracy is looking on the new dataset that was just made available. And so, we rinse and repeat, stepping through all the manual steps of data preparation, model training, evaluation, and finally deployment. At this point, you’re probably thinking your old roll-the-rock-up-the-hill gig wasn’t all that bad after all.

Or maybe there’s a better way: enter Watson Studio Pipelines! With this latest offering you have the power to customize and automate this entire process.

The Watson Studio Pipelines user interface is built around nodes that represent critical actions of the AI lifecycle. We start by adding a node to connect to our dataset.

Adding a Copy asset node to start the orchestration flow. User opens the left hand side navigation pane, selects the Copy asset node and drags it to the flow canvas.
We start by adding a “Copy asset” node to connect to our dataset

Pointing this Copy asset node to a Cloud Object Storage location lets us automatically use the refreshed dataset that the marketing team makes available every month.

Pointing the Copy asset node to a COS location.
Pointing the Copy asset node to Cloud Object Storage automatically uses the most up-to-date dataset

We then add an execution node that will run the notebook with our custom pipeline to train a model. Impressed by AutoAI’s performance on our data, we give it a crack at the new data as well.

Adding notebook execution and AutoAI experiment nodes to the flow. User opens the left hand side navigation pane, selects the Run notebook and Run AutoAI experiment nodes and drags them to the flow canvas.
Adding Run nodes allows us to execute Notebooks and AutoAI experiments

By customizing the Run notebook node, we can pass environment variables that will be used to apply our code to the new dataset. We can either manually define these variables or set them to reference outputs from upstream nodes.

Passing environment variables to notebook execution nodes from other nodes upstream.
We can reference outputs from upstream nodes as environment variables in notebook execution

We then add another notebook execution node that selects between our pipeline and the one produced by AutoAI: may the best model win!

Comparing custom notebook model with that selected from AutoAI, allowing for the best model to be selected on refreshed data. User connects the existing Run notebook and AutoAI experiment nodes to a new notebook execution node that has custom code for selecting the best model.
With new data, we re-run the comparison between AutoAI generated and our custom generated pipelines

Our last task is to deploy the best model as a web service, where our colleagues can test it for themselves. Fortunately, as Watson Studio Pipelines are fully integrated into Watson Studio, we have access to all the latest and greatest deployment offerings, and we can easily expose our model as a REST API that will score payloads online.

We also export the winning model from this month’s dataset to a Cloud Object Storage location where it can be picked up and deployed in production by our operations teams¹.

Exporting the winning model to an online web service and Cloud Object Storage location to be picked up by other teams.
We export our winning model for others to test and deploy

All that’s left to do is schedule our pipeline to run on the new data every month. Next stop: automation station!

Creating a scheduled execution job to automate this flow every month.
As a final step, we set up a scheduled execution of our pipeline to automate this entire process every month

In this example use case, we showed how with just a few simple clicks and setup through an intuitive user interface, you can create a fully automated continuous development AI pipeline. To experience this AI lifecycle automation for yourself, sign up for the Watson Studio Pipelines beta access list.

Happy modeling!

[1] Read more about how the production operations team can also benefit from lifecycle automation

--

--