Watson AutoAI Series: Part 2 — Classification Model

Aishwarya Srinivasan
IBM Data Science in Practice
4 min readApr 13, 2020

In this module of the AutoAI series, we will be going over building rapid prototypes over a sample dataset and extracting the reproducible code of the pipeline in python.

In case you are new to AutoAI, it is recommended you read my introduction to AutoAI blog post linked here.

Let’s start with what rapid prototyping means. It is a process of automatically ingesting, cleaning, transforming data, and modeling with hyperparameter optimization.

In IBM’s AutoAI, the terminologies used are the same as those used in the open source machine learning world, for example: ML pipeline, estimators, hyperparameter optimization, and model evaluation metrics. The data to be used in AutoAI currently supports CSV (comma separated values) format. The data must include the labeled target column(s).

Below is an example of a rapid prototyping pipeline:

Let’s now look at a financial services marketing use-case for a classification task. It is a straightforward task to predict if a banking client will subscribe to a term deposit after a marketing campaign phone call.

Here is a quick view of the features in the sample data we have:

Now, just to see what the numbers look like, let’s view a subset of the sample data:

Let’s start with creating a project in IBM Watson Studio.

Here, we enter the details of the project.

Click “Add to Project” and choose AutoAI Experiment.

The AutoAI below is where you would upload the dataset as a CSV file.

We add the bank.csv file which contains the data. And this is what you see on the dashboard.

Now select the column that is the target column. The type of model and evaluation metric is autoselected, but you can change it in the setting if you wish to use a different metric or modeling algorithm.

Now just click on run experiment. AutoAI starts running all the steps of pre-processing, model selection, feature engineering and hyperparameter tuning, producing multiple pipelines while optimizing the model evaluation metric.

The evaluated pipelines are produced in the dashboard with the chosen metric. The models produced by AutoAI can be either deployed on Watson Machine Learning (useful for business analysts who wish to operationalize the model) or it can be exported as python notebooks which would have the entire codebase of how the feature engineering and hyperparameter optimization was done.

The exported notebook would look like this.

So now you can get “hands on” with creating and implementing an AutoAI experiment on your dataset.

I hope you found this blog post useful. I invite you to leave your comments and thoughts below.

--

--

Aishwarya Srinivasan
IBM Data Science in Practice

LinkedIn Top Voice 2020- Data Science || MS Data Science - Columbia University || IBM- Data Science Elite || Unicorn in Data Science || Scikit-Learn Contributor