NeurIPS 2020 — The AutoAI Lifecycle

Published in

IBM Data Science in Practice

7 min readJan 11, 2021

An image showing data inputs, necessary steps in the machine learning process, and a visualization of a neural network. — AutoML Pipelines

IBM Research gave a workshop at NeurIPS 2020 on AutoAI and automated machine learning (AutoML) together. It offered a comprehensive view of the differences between the two terms, as well as a contextual understanding from their backgrounds and future trends.

First, what is AutoAI? We can first refer to IBM Researcher Lisa Amini’s abstract from the presentation.

“Automated Machine Learning (Auto ML) refers broadly to technologies that automate the process of generating, evaluating, and selecting an ML pipeline optimized for a specific dataset. Techniques tackle both traditional ML pipelines with data pre-processing, feature engineering & selection, algorithm selection and hyper parameter optimization, and neural architecture search (NAS) for deep learning models. While current capabilities in Auto ML enable users to complete these steps in a few mouse clicks or lines of code, it still automates only a small portion of the data scientist and ML engineer’s workloads. In this talk, I will focus on recent advances that will have a dramatic impact on driving automation across the entire AI/ML lifecycle, from data discovery and curation; to advanced model building with business and fairness constraints; to automation to monitor models in deployment, recognizing deficiencies and recommending corrections. Automation for this end-to-end AI/ML lifecycle is sometimes referred to as AutoAI.”

IBM Watson Studio with AutoAI grew out of IBM Research into the product space as an offering designed for both inexperienced data scientists and experienced teams of data scientists. It allows those teams and ML engineers to rapidly test, create, and deploy machine learning models. Throughout the lifecycle, models must be explainable, they must be monitored, and must be updated when their performance falters. AutoAI enables all of these functionalities with a thoughtfully curated platform that includes dashboards, interactive visualizations, and step-by-step guidance on model searches.

Data Science Lifecycle

Let’s first define what we mean by the data science lifecycle and build some agreed-upon some terminology. As with any data science pipeline, AutoAI also begins with the data. After the user has uploaded their data, they are able to check on the data quality with a variety of metrics, such as label purity, data homogeneity, and class parity. Users can then remediate any issues with the data with a variety of tools and techniques available within AutoAI.

Imbalanced data is a very common occurrence in data sets, and AutoAI offers two strategies for resolving imbalances. One of these is the common approach of oversampling, while the other is a constraint-based automatic machine learning technique to remedy the imbalances.

Our next step before the pipeline search begins is to select targets, algorithms, and metrics for the configuration. Stakeholders often have specific constraints, such as fairness and business constraints. Fairness constraints such as parity difference and error rate difference, and business constraints including memory overhead, inference time per query, and false positive/negative minimization are targets available to the user.

We can choose a variety of optimization metrics. For a binary classification problem, a data scientist will often optimize for metrics like an ROC AUC curve, accuracy, and log loss, and all three of these along with many other metrics are available options. Finally, the user can then choose from a variety of algorithms to use or not use for the type of problem they are solving.

Based on the data scientist’s selections, AutoAI then searches for the best pipeline configuration in the space. AutoAI searches the space using a novel optimization scheme that IBM researchers presented at AAAI 2020 that leverages the alternating direction of method of multipliers, or ADMM, optimization framework that breaks down the larger problem into smaller simpler tasks. Using this framework, AutoAI automatically selects feature sets and learning algorithms with their associated best hyperparameter configurations to find the optimal model given the search parameters.

After AutoAI’s optimization process has run, the human user can then examine the pipelines using visualizations such as a sunburst visualization for peeling through the layers of algorithms and pipelines that AutoAI tested. Additionally, a novel visualization allows the user to compare different pipelines in terms of both the metrics and the optimization trajectory.

Visualization of a sunburst view and a longitudinal view of an AutoAI search. — Sunburst viz of pipeline search | Comparing pipelines in terms of both metrics and optimization trajectory

In one of our penultimate steps, we can then download the best pipeline to a Jupyter notebook. Supported by Lale, an open source semi-automated machine learning project by IBM researchers and powered by AutoAI’s Python API, the generated Python code is compatible with any sci-kit learn pipeline, making it understandable to almost any data scientist. The user can examine the code in detail and run predictions and alter the pipeline within the notebook. Once the user is satisfied the pipeline is optimal, it can then be deployed anywhere, including within IBM’s Watson Machine Learning service.

After initial deployment, however, the possibly tedious but fundamental work of maintaining a germane model underlies the success of a use case and this is where automating as much of the monotonous work is advantageous to allow for the best use of human judgment and computational power. Explainability and monitoring are key to the human decisions to retrain and redeploy a model. These processes are simplified by AutoAI on Watson Studio and Watson Machine Learning.

From the leaderboard on AutoAI, a user can drill down into the details of a model and is able to look at global deviations between predictions and actual outcome, and then go even further into each data point to examine the deviation locally. Examining the pertinent positive thresholds allow users to explain which features are adding strength to the prediction and the pertinent negative thresholds that, if minimally absent, what allows current prediction features to hold.

Screenshot of leaderboards and how these fit in the AutoAI lifecycle in explainability of models — Explainability for black box models

These local examinations afford us detailed explainability on local variations on black box models. Again, though, it does take a human SME to tell the story of the data. In one use case, a model was built to predict energy use at an office building and there were two weekdays where the actual energy usage resembled that of a weekend. A SME looked at it and pointed out that the two weekdays were holidays and thus the model could be adjusted to allow for holidays as a new variable in the temporal classification field.

AutoAI also empowers the data scientists to quickly monitor both data drift and model performance drift with a combined score and individual scores available to view.

Visualizations of data and model drift and showing its place in the AutoAI life cycle. — Data drift and model performance drift scores

Users can view detailed information on which individual features are related to the data drift and how each of these perform in training, validation, and deployment scores.

Detailed bar chart visualization and graphs of data drift — Detailed graphs on data drift

These monitoring features are built upon the metrics already available in IBM Watson Open Scale, and now these can be used to determine when to retrain and redeploy our models.

The final aspect of the data science pipeline where AutoAI provides incredible value for the user is in creating interpretability for the behavior of deep learning models through the use of Boolean rules. The user can compare several models simultaneously against their qualitative differences before a deployment or redeployment. Visualizations display how similar or different models are in terms of semantics and accuracy. Users can select one cell and look at greater depth to see how a rule is unique or if it has overlap with the other model’s rules via a Venn diagram visualization or a bar plot. The determination of these rules is a complex process unto itself — researchers did not want to make the rules too long or too short, but they had to stay true to the behavior of the model. Balancing that concern with the need for a human interpretable output gives organizations powerful new tools for examining any model before they place it in production.

AutoML promises to revolutionize machine learning. By lessening the timeline of the end-to-end AI lifecycle, AutoAI and other tools like it promise to decrease the time needed for AI development and thereby increase the speed of model experiments and model quality. AutoAI also empowers more organizations to adopt AI as part of their enterprise workflows by providing clear and accessible interfaces that can be used by a Subject Matter Experts rather than a machine learning expert. Possibly most important of all, AutoAI gives greater transparency into model development and feature strength, allowing for understandable and explainable models that are key to regulatory industries and consumer trust.

Credits & Thanks:

Thanks to co-author Jana Thompson

Credit to Lisa Amini and Horst Samulowitz for sharing their presentation materials and their original content.

NeurIPS 2020 — The AutoAI Lifecycle

Data Science Lifecycle

Published in IBM Data Science in Practice

Written by Will Roberts