Breaking the Magicians’ code with IBM Watson Studio’s AutoAI Notebooks
It’s one of the most famous codes of conduct: “Magicians never reveal their secrets.” But there are rare moments when an audience is so wowed, so struck by the ‘what just happened’ feeling that it would be almost cruel not to let them in on the secret.
At IBM Watson® Machine Learning, we couldn’t help but show you how we pulled the rabbit pipeline out of the magical data hat, and we are proud to announce the General Availability of the AutoAI Notebooks feature.
With this feature, we are inviting AutoAI users, mystified by its machine learning magic, to peek behind the award winning user interface and interact directly with the APIs and code that power the AutoAI engine. The AutoAI Notebooks are comprised of automatically generated Python code and API calls that offer users a guided tour of all the automated steps AutoAI executes in creating state of the art machine learning pipelines. Exposing the APIs and steps gives users the ability to programmatically interact with and customize their AutoAI pipelines and experiments.
For those readers new to Watson Machine Learning, AutoAI is an automated machine learning tool that is fully integrated within Watson Studio on Cloud Pak for Data™. AutoAI does in minutes what would typically take hours to days for whole teams of data scientists. This includes data preparation, model development, feature engineering, and hyperparameter optimization. Take a moment to learn more about IBM’s AutoAI at work: two real-world applications.
Visit Watson Studio today to get started and experience this product for yourself!
Hop on the Magic School Bus
At IBM Watson Studio, we know that if data scientists were Kareem Abdul-Jabbar, then Notebooks would be their Sky Hook. So if you’ll indulge the 90’s kid in us for a moment, we’d love to do our best Ms. Frizzle impression for you and provide a tour of the AutoAI Notebooks that were recently made generally available, highlighting some of the key features and customization options available.
Let’s start with some sample data. In this experiment we’ll be using a dataset with various automobile information to predict insurance riskiness categories. Read more about the dataset and download instructions.
Our first step is to add this data to a Project and run an AutoAI experiment:
Pipeline Notebooks
Here’s the part where you say, “but wait, how did they do that?” Well, let us show you. The first type of Notebook that is surfaced to users, can be found in the pipeline leaderboard:
For each pipeline generated by AutoAI, we provide users with a step-by-step guide for the code to ingest data, split training and test sets, and build train, and score the pipeline. This enables the curious data scientist to see what made their winning pipeline (or any other generated pipeline for that matter) work, and with access to the source code, users can customize the pipelines to their needs.
Experiment Notebooks
But we didn’t want to stop there. Once the data for an experiment has been preprocessed, we also give users insight into the entire AutoAI process with a Notebook that guides them through the full experiment!
This notebook has several important sections. The first section exposes Python code and APIs for reviewing the experiment that was just launched. In this section, users can load details about the experiment and view summary tables and graphs of the generated pipelines.
Next is a section that allows users to inspect any of the pipelines from the experiment, with the fantastic pipeline visualization built on the Lale package, giving users a much more tangible feeling of what comprises each pipeline generated by AutoAI. Users can also click on any node in the pipeline visualization to read documentation about that node’s transformer:
To take full advantage of Watson Studio, we also provide instructions and code for integrating this experiment into other data science lifecycle offerings. For example, in the image below, we display how to deploy a pipeline as a REST API that returns predictions online:
Finally, with last section in the notebook, users can run the entire AutoAI experiment, from data ingestion to optimal pipeline selection, using a single API call, which gives users the ability to integrate AutoAI directly into their other workflows:
Both of the Notebooks described in this post render the power of AutoAI much more accessible to users who want further customization and the ability to share their work and integrate source code into their other projects.
To experience AutoAI and Notebooks, visit Watson Studio.
Happy modeling!