Improve AutoAI time series forecasts with supporting features using IBM Cloud Pak for Data as a Service

James Xu
7 min readJul 18, 2022

--

Observing and analyzing time series data is common for most business domains. Time series data can often reveal valuable patterns and provide insights for driving business decisions and optimization. The ability to analyze time series data and forecast future values uses statistical analysis, data mining, and machine learning to reduce the fog of the future and provide visibility to what is coming.

Time series analysis and forecasting, in the general case, involves time series data handling, proper algorithm selection and tuning, result evaluation, and deployment for forecasting, where some knowledge of data analysis and time series modeling is required to make sure the project can be delivered with the expected quality.

AutoAI in Watson Studio now provides a graphical tool so you can handle your time series data in an easy-to-use, interactive way. Create an AutoAI time series experiment to get a rapid start on your data modeling. The tool also offers advanced features that help you tune your analysis or work with the data transformations in a Python notebook.

On IBM Cloud Pak for Data as a Service, AutoAI experiments support the ability to extend and enhance analysis of time series data with supporting features, also known as exogenous features. Specifying supporting features adds context to the problem being solved and can contribute to the analysis of your prediction column, improve the prediction, and result in greater model accuracy. For example, in a time series model that predicts energy consumption for the coming week, considering the expected temperature can make your forecast more precise.

Some of the UI changes implemented for supporting features include:

  • 11 new algorithms added to allow Supporting features in ‘Algorithm to include’
  • ‘Supporting feature’ list for selecting and deselecting supporting features
  • ‘Leverage future values of supporting features’ allows you to consider future values at prediction time. For example, if you are calculating expected sales for a clothing item, information you can provide about future discounts and promotions enhances the quality of the forecast.
  • New label (SUP) identifies supporting feature algorithms in Pipeline leaderboard
  • New deployment UI for requesting prediction with future values for supporting features

Analyzing time series data

Let’s take a closer look at the process for analyzing time series data in AutoUI, including how dynamic vizualizations show the step-by-step progress of training the experiment to forecast new values.

In this post, we will talk about the time series analysis process in AutoAI experiment, especially the case with supporting features, and we will show how the graphical analysis progress can be used step by step.

Import your time series data into project

To start time series analysis in your project, click “New asset” and choose “AutoAI”.

Creating a new AutoAI experiment in New asset page

After you create an AutoAI experiment, you will be prompted to ‘Add data sources’.

Adding data sources

You can select an existing data file from the project, or browse for one from your local machine.

As an example, we use pollution data which records a daily pollution index with some other possible influencing (supporting) features in the 5 years from 2010 to 2014.

Preview of air pollution data

We select ‘pollution’ as the prediction column, and ‘date’ as the Date/time column. When AutoAI analyses the data, the tool selects other columns by default to use as supporting features. Of course, you can manually update the selections in ‘Experiment settings.’

Selecting prediction and date/time columns

In ‘Experiment settings’, several new algorithms are available to allow supporting features and are marked in ‘Allows supporting features’ column. All the algorithms selected here go into the competition and will be ranked according to evaluation metrics. Later, the top algorithms/pipelines are promoted to your Pipeline leaderboard for experiment results review.

Some new algorithms to allow supporting features

Looking at the ‘Data source’ tab, you can view all the columns that are selected by default as supporting features in the ‘Supporting feature’ list. You can customize the list to suit your business needs.

Note the option for ‘Leveraging future values for Supporting features.’ This is for cases where the future value of a feature is knowable and can be provided to strengthen the forecast. In our sample case for forecasting daily pollution statistics, weather related data columns are supporting features that influence the pollution values. Since weather data for the time window being forecast is knowable and available, we can provide those values to support and improve the pollution forecast. If your case is a similar scenario, enable this option to take the future value of supporting features into account. You then supply and leverage those future values when you deploy the model and request the prediction on new data.

Supporting feature selection

With configuration choices complete, you can now save the settings and click ‘Run experiment’ to kick off your time series analysis. You can see that it is simple to set up and configure your time series model and that you can quickly get from training to forecasting.

Follow up the analysis progress with graphical UI

After you confirm and run your experiment, you will jump to a graphical summary dashboard where you can monitor the progress of the analysis and training, then view the results.

AutoAI running in progress

At the experiment level, you can reivew an experiment summary with two views to monitor the progress of the experiment. In the ‘Progress map’ view, you can find a tree structure which shows the whole data analysis process from the data to post-evaluation with backtesting for validation.

Progress map in Experiment summary

In ‘Relationship map’, you can explore how the recommended algorithms produce pipelines with selected feature transformation, from the inner circle for data to the outer circles with insights on algorithm, pipeline, and feature transformation.

Relationship map in Experiment summary

Review the recommended pipeline with insights

As the training progresses, completed pipelines are added to a pipeline leaderboard with some pipelines listed under the Experiment summary view. This is the list of the winning pipelines ranked according the ‘Optimized metric’ configured in the experiment settings for the time series problem.

As you review the pipeline leaderboard, note the SUP tag in the Enhancement column that indicates that a pipeline includes supporting features.

Click the name of a pipeline to view pipeline details.

The ‘Model evaluation’ view displays a sequence chart with actual values and predicted value comparison that shows how this pipeline performs on the holdout (testing) period. The dotted line in green at the right-hand side of the sequence chart shows the forecast values in the future range.

Additional evaluation measures are there for your reference to support explainability of the process and results.

Model evaluation in Pipeline details

If you want to check how this pipeline performs on your specified testing period for each backtesting and holdout period, ‘Predictions over time’ provides you a dynamic chart you can manipulate to examine values and performance.

Prediction over time in Pipeline details

Forecast with supporting features

After you explore the results, you can save the best-performing pipeline to your project as a model that you can use to generate forecasts. Promote the model to a deployment space, where you can create an online deployment of it, as shown in the next image. The ‘API reference’ tab guides you to do online prediction with some code snippets. In the Test tab, you can do a quick forecast by inputting new data on the fly as a table or as JSON.

Score online deployment with supporting features

From the Test tab, you can upload a CSV in spreadsheet form to provide the input values for the predictions. If your model is designed to leverage the future values for supporting features, you can supply those future values as part of the input payload.

Click “Predict” to view the forecast for the next step in the time series. That’s it! You have now seen how simple it is to create a time series experiment which considers supporting features to produce a reliable forecast.

Summary

In this article, we went through the key steps of automatic time series analysis in AutoAI in IBM Cloud Pak for Data as a Service, with a special focus on the new UI options for supporting features. We also reviewed how automatic configuration analysis, the ability to manually tune your experiment, and an interactive progress summary can help you understand the logic AutoAI builds into the experiment. The ability to explore pipeline details can further deepen your understanding of your time series data and business patterns. Let AutoAI give you the freedom to model your time series data with ease and confidence.

For guidance on using AutoAI to create time series experiments in Cloud Pak for Data as a Service, please see our documentation.

Acknowledgement

Many thanks to Julianne Forgo and Jihui Yang, they help a lot in review and revision of this blog.

--

--