Right on time(series): Introducing Watson Studio’s AutoAI Time Series

Yair Schiff
IBM Data Science in Practice
6 min readMar 1, 2021

Introducing AutoAI Time Series

Time series data are ubiquitous in business, industrial, and data science applications but dealing with these types of data is often more difficult. Let’s take for example tracking of economic activity trends, forecasting sales at a retail location, or predicting the course a disease might take — if a dataset contains a temporal component, more advanced and tailored data handling, statistical, and machine learning modeling techniques are required. Given both this prevalence and complexity, the IBM Watson® Machine Learning team is proud to introduce the general availability release of AutoAI Time Series.

Combining the rigorously proven and tested algorithms from IBM Research and the award winning user interface of AutoAI with time series capabilities, this release further extends Watson Machine Learning’s best-in-class data science and machine learning offerings. The new time series capability is seamlessly integrated into the intuitive AutoAI workflow, and, as part of the IBM Watson Studio platform, users will continue to have state-of-the-art tools for managing the entire data science lifecycle from data pre-processing to model deployment and monitoring.

AutoAI is an automated machine learning tool that is a fully integrated within Cloud Pak for Data and Watson Studio. AutoAI does in minutes what would typically take hours to days for whole teams of data scientists. This includes data preparation, model development, feature engineering, and hyperparameter optimization. Take a moment to learn more about IBM’s AutoAI at work: two real-world applications.

If you are interested in trying out the new AutoAI Time Series, get started with Watson Studio today!

Let’s take it out for a spin

The best way to introduce AutoAI Time Series is by walking through a real-world application that highlights all the benefits and features of this cutting edge offering. In this example, we will use a publicly available time series dataset consisting of electricity usage by various clients. The natural business application here is for an electricity generation company to predict demand and plan production accordingly. Read more about the dataset and download instructions.

As with other Watson Machine Learning offerings, we begin by adding an experiment to a project (for those AutoAI aficionados, many of these initial setup steps will be familiar, but keep an eye out for the brand new time series capabilities below).

AutoAI Time Series experiment setup.
AutoAI Time Series experiment in Watson Machine Learning

AutoAI immediately surfaces the new feature and asks if this should be a time series forecast experiment. Why, yes. Thank you, AutoAI.

We select the timestamp and prediction columns and we’re off to the races.

To run an AutoAI experiment, add a dataset and select the timestamp and prediction column.
Running an experiment is as easy as adding a dataset and selecting timestamp and prediction columns

But wait, there’s more! Navigating to the Experiment settings prior to running the experiment exposes some of the bells and whistles of AutoAI Time Series.

AutoAI Time Series Experiment Settings: choosing number of backtests and Gap Length.
Experiment settings give users control over important features, such as number of backtests and Gap Length

For example, validating a time series prediction is all about seeing how well the model’s forecast agrees with the ground truth. The experiment settings let users customize the number of “backtests” to run as validation. Each of the bars represents a validation step, or backtest, where the model will be re-trained on the green portion of training set and evaluated on the purple ground truth. In addition to the training-holdout validation, this provides greater insight into how our pipelines are performing over time.

The settings also allow customization of the gap (“Gap Length”) between training and evaluation data during each of the backtests. The Gap Length is the number of time steps to skip from the end of the training data of the backtest to its validation data. This enables users to see model validation results on earlier time periods of the data without increasing the number of backtests.

AutoAI Time Series Experiment Settings: choosing the Lookback window and number of time steps to predict into the future. AutoAI automatically detects an ideal Lookback window candidate.
AutoAI automatically selects an ideal Lookback window candidate

An important hyperparameter in time series prediction is the “Lookback window”: the number of time steps in the past to use when predicting the future. Thankfully, AutoAI has our back on this decision too. Combining automatic seasonal period detection and advanced signal processing methods, AutoAI chooses the ideal Lookback window candidate for our dataset. This parameter can be manually overridden by users.

Users can also select the number of time steps into the future that will be predicted by the model, by changing the “Forecast window.”

Finally, as most data science practitioners can attest, missing data can often be a significant pain point when building machine learning models. If only there were some way to test different missing data imputation methods and select the one that best fits our… oh wait, AutoAI has that feature too!

AutoAI Time Series Experiment Settings: choosing data imputation methods and setting maximum threshold for missing values.
Missing value imputation is seamlessly integrated into the AutoAI workflow

Users can select various imputation approaches that will be applied to our missing data, and the one that produces the lowest mean absolute error is selected (more on this below).

With the experiment settings finalized, all that’s left to do is sit back and watch AutoAI work its magic.

AutoAI experiment in action: reading in the data, splitting train and validation sets, and algorithm selection.
AutoAI automates the entire data science pipeline: from data handling to algorithm selection and testing

AutoAI reads in our dataset, splits training and validation sets, selects the top algorithms (which is also a configurable parameter), applies feature engineering, and outputs a leaderboard of the top performing pipelines. Algorithm selection occurs using the novel T-Daub methodology, an efficient incremental data allocation approach that allocates more training time and data to only the most promising pipelines.

We can also see how our data imputation selections performed by opening Data details pane:

Reviewing the imputation results and seeing the best selected methodology.
Reviewing data imputation to see the selected methodology and percentage of data that was imputed.

Once the training and validation work is done, the time series backtests are performed on the top selected pipelines, which can be viewed by selecting one of the pipelines in the leaderboard.

Pipeline leaderboard results: Performance from each backtest.
Pipeline backtest performance
Pipeline leaderboard results: actual vs predicted trends over time.
Selecting a pipeline from the leaderboard allows users to see how it performed on validation and backtests

Finally, because time series is available within AutoAI and the broader Watson Studio, users have access to the full suite of solutions for the entire data science lifecycle. Once the desired pipeline is selected, it can be saved and deployed as either a REST API that returns predictions online or as batch scoring jobs.

With just a few clicks through an intuitive user interface, AutoAI Time Series automates and optimizes a difficult prediction problem that would otherwise take significant engineering efforts from teams of data scientists. To experience AutoAI Time Series for yourself, get started with IBM Watson Studio today.

For more information about Watson Machine Learning and AutoAI, visit the links below:

Happy modeling!

--

--