Simplifying time series forecasting with Google Vertex AI AutoML Platform

Shiv Saxena
10 min readSep 30, 2022

--

Forecasting business key performance indicators (KPIs) is a quite prominent use case across several industries, and helps companies in reducing risks, making better financial decisions that increase profit margins, improve resource allocation, and create more opportunities for growth. However, generating accurate forecasts at a scale poses several challenges as well. In this blog, I will be covering the application of forecasting in different industries, common challenges, and how to solve them using state-of-art Machine Learning capabilities on Google Cloud Platform.

Forecasting use cases across industries:

Let’s start by looking at use cases of time series forecasting:

  1. Retail and CPG companies need to quickly and accurately anticipate customer demand to ensure they can deliver the right products at the right price and the right time.
  2. Power distribution companies forecast short-term, mid-term, and long-term load requirements for capacity planning, efficient power procurement, selling of excess power etc. Renewal power generation companies forecast power generation considering many factors such as weather, location, wind speed and other climatic conditions.
  3. If you are running Call Center operations, you would want to predict the call volume to hire more representatives, or you’re a hotel chain predicting hotel occupancy for next season, or a hospital forecasting bed occupancy.
  4. Automotive companies forecast vehicle sales to plan their supply chain operation, logistics, production, and design marketing campaign strategy. This requires taking into account multiple influencing factors such as festivity, fuel price, inflation etc to create comprehensive forecasting models.
  5. Manufacturers would like to forecast demand to optimize production planning for each manufacturing line, lower cost of logistics, and ensuring optimal stock levels to meet the demand thereby reducing the risk of overallocation and under allocation of inventory.

Challenges in building efficient forecasting Models:

Now, we know some critical use cases of forecasting, let’s understand typical challenges that companies face in the operationalizing forecasting process.

  1. Building a quality forecasting model requires significant manual effort and deep ML expertise to be able to experiment with multiple algorithms, feature engineering, and hyper-parameter tuning. However, such expertise may not be broadly available, which can limit the benefits of applying ML towards time series forecasting use cases.
  2. Second, the time series data from the real world often suffers from missing data and high intermittency (i.e., when a high fraction of the time series has the value of zero). Some time series tasks may not have historical data available and suffer from the cold start problem, for example, when predicting the sales of a new product.
  3. Thirdly, retail organizations may have 10’s of thousands SKUs. On-prem or Excel based forecasting models limit the ability of an organization to train and deploy a time series forecasting model on the large amounts of multi-variate dataset at a grain of minutes, hours and day.

Due to the above factors, many companies still rely on manual forecasts and/or heavily modify their business plans based on previous year data, gut feel and industry experience.

So, how do we solve these challenges? Yes, you guessed it right! Google Cloud offers Vertex AI Forecasting to make time series forecasting simpler for any organization.

Vertex AI — AutoML Forecasting

AutoML Forecasting is a fully managed service that takes care of heavy lifting of data analysis, feature engineering, model selection, hyper-parameter tuning, and evaluation without any manual interventions. It’s a no-code and web UI-based offering that empowers users of all skill levels to quickly train accurate and sophisticated forecasting models without worrying about underlying infrastructure. The key part of the Vertex AI Forecast is the model architecture search, where the service automatically evaluates hundreds of different model architectures and finds the best performing model setups for the given datasets.

With the hierarchical forecast capabilities of AutoML Forecast, companies can generate a highly accurate forecast that works on multiple levels such as individual SKU, store, and geography level to minimize the challenges created by organizational silos. You have the ability to ingest large volumes of structured and unstructured data, and include many relevant demand drivers such as weather, product reviews, macroeconomic indicators, competitor actions, commodity prices, freight charges, and more to get accurate forecasts. Moreover, the explainability feature of Vertex AI Forecast explains how each of these drivers contribute to the forecast, and helps the decision makers understand what drives the demand to take the corrective action early.

Now that you have an overview of Vertex AI AutoML forecasting, let’s give it a try and build a sales forecasting model without writing any code.

Build a Sales Forecast model for Automobile company:

Sales forecast in the automobile industry depends on multiple factors such as transmission type, fuel price, festivity, promotion strategy, inflation rate, quality of service and more. And, this often requires a hierarchical forecasting model to generate forecasts at geography, dealer and model level. However, for the demonstration, I’ve created a simple dummy dataset that includes historical sales data of different vehicles. Let me first explain you the columns in my dataset:

sale_date: mandatory timestamp column with consistent granularity (hourly, daily, weekly etc) for all rows in the dataset.

vehicle_model: series identifier. I’m training a model to forecast sales for three models of a vehicle and have taken values like 0,1,2 to represent each model.

is_festival: an external variable that can influence the value of target variable. You may add more covariates as needed.

units_sold: is the target value that I want the model to predict.

Let’s jump to the GCP console and start building.

1: Bucket Creation — Create storage buckets for storing forecasting input and output data.

Navigate to GCP Console → Cloud Storage, and create two buckets namely salesforecast_input, and salesforecast_output in the us-central1 region. Remember that bucket names are unique globally so you may need to change buckets name to make them unique.

2: Dataset Creation — Navigate to GCP Console → Vertex AI → Datasets → Create dataset, and create a dataset that will be used for model training.

  • Provide a name to the dataset such as “sales_forecast”.
  • Choose ‘Tabular’ as datatype and ‘Forecasting’ as objective.
  • Select ‘Region’ and hit CREATE.

On the next screen, you will map the dataset with the data source. It could be CSV files stored on your computer or files stored in cloud storage or a table from BigQuery. In my case, I’ll choose ‘upload CSV files from your computer’ and select Vehicle_Sales.csv

Provide the cloud storage bucket ‘salesforecast_input’ that you created in the previous step.

Hit ‘Continue’ to proceed.

3: On the next screen ‘Analyze’, AutoML forecast will show dataset properties. You can optionally generate statistics by clicking on the ‘Generate Statistics’ button on the right side of screen to get info like missing and distinct values in your dataset.

4: Training the Forecasting Model — Choose vehicle_model in ‘Series identifier column’ dropdown and sale_date in ‘Timestamp column’ dropdown as shown above. Click on the ‘TRAIN NEW MODEL’ button.

This will be a four steps process to configure a model training job. On the first screen, no change is required, continue with default settings and hit the ‘Continue’ button to proceed.

On the second screen, specify the model name and optionally model description.

Select the units_sold as target column. The target column is the value that the model will forecast.

I had already specified the series identifier and timestamp column on the dataset creation screen so no change here.

Choose ‘Daily’ as Data granularity. It could be minutes, hourly, daily, weekly, monthly, yearly depending on your use case.

You can optionally select a holiday region to enable holiday effect modeling. I selected ‘India’ region.

Next is to set the Forecast Horizon and Context window. The forecast horizon determines how far into the future the model forecasts the target value for each row of prediction data. I have specified 15 as Forecast horizon so that the model predicts the daily forecast for the next 15 days. The context window sets how far back the model looks during training (and for forecasts) for predictive patterns. I have taken 45 as a Context window. You can refer to the best practices related to these terms here. Hit ‘Continue’ to proceed.

On the next screen, set whether columns will be available at forecast. If you select ‘Available’ then you must provide the value of this column for each point in the input prediction dataset (this is explained in the later section of the blog). Since we know whether a given date is a holiday in advance, I have selected ‘Available’ for the column is_festival. Hover the mouse on the ‘?’ to know more about these features.

On the same screen, click on the ‘ADVANCED OPTIONS’ to explore advanced configuration options such as Weight column, Optimization objective, and Hierarchical forecasting. I’ll stay with the default settings and hit ‘Continue’.

On the next page, you input the number of node hours that the platform will use for training the model. The cost of training your model in AutoML depends on the node hours you specify. AutoML provides recommended node hours based on the row count in your dataset as shown below. I have taken a minimum node hour of 1 as my dataset has only 1,095 rows. Click on the ‘START TRAINING’ button to initiate the training.

Model training may take hours depending on the size of the dataset. You can check the training status by clicking the training icon provided on the left panel. AutoML sends email notification on the change of training status to let users know if training has completed or failed. Successful completion of training will change the status to ‘Finished’ as you can see below.

5: Model evaluation — Next is to evaluate the Model using different metrics. Click on the model name ‘sales_forcast’ as shown above. It will take you to the Model detail page. Metrics provided on the ‘Evaluate’ tab provide quantitative measurements of how the model performed on the test dataset. Click on the ‘?’ sign to learn about these metrics.

Another powerful feature in AutoML Forecasting is the feature attribution. It tells you how much each feature in the data contributed to the predicted result. You can then use this information to verify that the model is behaving as expected, recognize bias in your models, and get ideas for ways to improve your model and your training data. You can see below that the contribution of is_festival feature is 20% towards the predicted results.

6: Generating Forecast — It’s time to generate forecasts. The first thing is to create an input data for batch prediction that the model will use to create forecasts. It is recommended that you use the same format for your input data as you used for training the model. Provide historical data for each time series to forecast. For the most accurate forecasts, the amount of data should equal the context window, which is set during model training. I have created a CSV file with historical data of 45 days for all three time series i.e. 0,1,2 and left ‘units_sold’ column blank for the dates to be predicted. If you remember I had taken a context window of 45 days and forecast horizon of 15 days.

I have uploaded the CSV file to the storage bucket ‘salesforecast_input’ that I created earlier. On the model details page, click on the ‘BATCH PREDICT’ tab and provide inputs as shown below. Batch prediction job will read input data from the source path provided below and write forecast output to the destination path of cloud storage. You may alternatively choose BigQuery for input and output data.

Click on the ‘CREATE’ to start the batch prediction job. You can see below, job is already created and in ‘Running’ state. The batch prediction process may take several minutes. You will receive an email notification on the completion of job.

After job completion, click on the batch prediction job ‘Forecast_daily’ to view the details. As shown below, there is a cloud storage link provided in front of Export Location, click on it to view the forecast results.

Output file shows the vehicle sale predictions for the next 15 days for each of the vehicle models. You will also notice that sale forecasting on the festival date is on a higher side indicating model has learnt the impact of is_festival variable on the vehicle sales.

Summary: In this blog, I explained how easy it is for companies to train and deploy accurate time series forecasting models with a low/no code approach using Vertex AI AutoML Forecast service. Automation of complex ML tasks, manageability, auto selection of model architecture, hierarchical forecasting, and feature attribution are some of the key features that help organizations to produce high quality forecasting model which is easy to manage, even without advanced data science expertise.

--

--

Shiv Saxena

Data Analytics specialist at Google. “This blog is based on my field experience and views are solely mine.”