Exploring ML Tools — Amazon Forecast

Bhavin Tandel
Explore ML
Published in
5 min readMay 21, 2020

This post will focus on forecasting services offered by Amazon Web Service called as AWS Forecast. Companies have been performing forecast on business outcome since long time spanning from financial market forecast to retail product demand forecast. Earlier Statistical methods and Advanced Mathematics have been used to predict the future outcomes, however, with advancement of machine learning and available of big data, we are able to use deep learning techniques to make accurate predictions.

Forecast Workflow

Introduction

In basic, forecasting is the method of predicting future based on past and present data. It consists of a model to which we feed historical time-based data y and we get future value of y. For example, we have model to predict future value of a stock then, we would feed historical data of stock to it and we will get the predicted value of the stock.

AWS Forecast is a managed service which provides the platform to users for running the forecasting on their data without the need to maintain the complex ML infrastructure. It is based on DeepAR+ algorithm which is supervised algorithm for forecasting one-dimensional time series using Recurrent Neural Networks. It involves datasets which is used to train predictors and generate forecasts.

How it Works ?

It consists of,

Dataset and Dataset Groups

  • You create Dataset Group and a Forecasting domain(Retail/Inventory Planning/Custom/..) based on your usecase. The complete list can be found here.
Dataset Group Creation, Target Time Series Dataset Creation, Dataset import job
  • Each dataset group can have three datasets, one of each type:

* target time series(required)

* related time series(optional)

* item metadata(optional) — only used when algorithm used is DeepAR+

  • You will have to select the frequency of your data. For example, in energy meter, we might take reading every 30 minutes.
  • Following columns have to be mandatory present in target and related time series:

* `timestamp` (must be of timestamp type)

* `item_id` (must be of string type)

* Data type of all other columns must be of type string.

  • Target time series must also contain `target_value` column, which is the column you are planning to forecast.

Predictors

creating predictor
  • It creates trained model which is called as predictors.
  • Users select Forecast horizon, which indicates how far you want to predict.
  • One can manually select algorithm or select AutoML feature for automatic algorithm selection.
  • There are some additional features available for fine tuning the forecasts.

Forecasts

Creating Forecasts
  • We can create forecast on the predictor which is created above.
  • You can select up to 5 quantile value including mean. You can choose based on your business need which can be either under-forecasting or over-forecasting. For example, 0.05 quantile means, the true data is expected to be lower than predicted data 5% of time. If we take example of predicting pandemic and number of cases, we would want quantile to be 0.99 for predicting hospital bed, because we don’t want to under-forecast the resources in this scenario.

Forecast Lookup

  • We can lookup in the console itself as shown in following screen shot.
Looking up the forecasts

Features

  • Automatically identifies key attributes for forecasting.
  • Manual Algorithm selection: DeepAR+, ARIMA, Prophet, Exponential Smoothing (ETS), Non-Parametric Time Series (NPTS). The detail list can be found here.
  • Provide AutoML option for model training which automates complex tasks(algorithm selection, hyperparameter tuning, etc).
  • Automatically fill missing values for target and related time series.

Usecase

Following are some usecases for AWS Forecast:

  • Estimating product demand.
  • Forecasting weather.
  • Predicting web traffic.
  • Estimating pandemic cases.

Usage

The forecast can be used via AWS cli, Console or SDK. Above we saw how one can perform forecast from console. Here we will see how we can use sdk to perform the forecast job

Input

We will be using data of covid19 cases available on kaggle.

Following is the screenshot of sample data.

sample-data

We will be forecasting Confirmed Cases for various countries which is named as item_id.

Process

Following steps will describe the forecasting using python sdk.

  1. Initializing AWS forecast client
  2. Creating Dataset group, we will be creating dataset group with name as covid19_week5.
  3. Creating Target time series dataset
  4. Import data to the dataset, we will import the data from s3 bucket into the dataset which should match to schema of it. And role should have get object permission on the bucket.
  5. Link dataset to dataset group
  6. Creating predictor
  7. Creating forecast
  8. Export the forecast data to S3 bucket

Following gist describes the above steps.

After completion of forecasting job, we can export the forecast value to S3 bucket and then utilize it based on our requirements like directly querying from Athena or loading it into data mart, etc.

Output

We can directly make a look on forecast in console as shown below. As you can see the forecast has been generated for our supplied quantile.

Forecast Lookup

The output is the csv files which can be exported to s3 bucket and then utilized.

Generated forecast files

For clean script, one can follow the snippet over here.

Findings

  • Easy to use tool to generate forecast.
  • Can auto fill the missing values.

Pricing

  • No upfront cost
  • Forecast are billed in unit of 1000

Free tier:

  • Generate Forecasts — Up to 10K time series forecast/month for first 2 month
  • Data Storage — Up to 10/month for first 2 months
  • Training hours — Up to 10 hrs/month for first 2 months

On demand:

  • Generate Forecasts — $0.60 for 1000 forecasts
  • Data Storage — $0.088 per GB
  • Training hours — $0.24 per hour
  • Detailed example of pricing can be found and Amazon Forecast pricing page.

--

--