Benchmarking Methods for Time Series Forecast
Four easy ways to set the baseline for your time series forecasts.
A simple, common-sense approach will establish a baseline that you’ll have to beat in order to demonstrate the usefulness of more-advanced machine-learning models.
For data scientists to keep their sanity when building models, they set a baseline — a score that the model must outperform. Normally, the state-of-the-art is used as the baseline but for problems with no existing solutions yet, one should build their own baseline. As Francois Chollet put it, a baseline is used to demonstrate the usefulness of more advanced forecasting techniques.
In this article, we will review four elementary baselines applied in time series forecasting problems.
To demonstrate each of the baseline method, let’s use the historical monthly prices of wheat from World Bank between 2001 and 2019. Prices in 2019 will be forecasted.
To quantify the performance of the baseline methods, I’ll use mean squared error (MSE). As the name suggests, MSE, is an error function that measures the average squared difference between forecasted and true values.
MSE can be computed using the equation above where N is the number of observations, Yi is the true value and Yi(hat) is the forecasted value. And since it is an error function, you’ll want it to be as small as possible.
This method simply takes the average (or “mean”) value of the entire historical data and use that to forecast future values. Very useful for data with small variance or whose value lies close to the mean.
Using the wheat price data, the forecasts for 2019 prices is equal to the average monthly price from year 2001 to 2018. MSE for this method is 19.7. To make the visualizations clear, only the last 60 observations including the forecasted values are shown. The blue flat line in the figure below represents the average value from year 2001 to 2018, while the orange one represents true values.
Drift is the amount of change observed from the data. In this method, drift is set to be the average change seen in the whole historical data and uses that to forecast values in the future. Basically, this just means drawing a straight line using the first and last values and extend that line into the future. This method works well on data that follows a general trend over time.
Drift method forecast is shown as the blue line in the figure below. Its slope follows the slope of the line drawn between the first price in the data and the last price from year 2018. MSE using this method on wheat price data is lower than the average method at 12.4.
A variation of the average method is to use a series of averages from a fixed number of recent values, this is called moving average method. This method is often used in technical analysis of stock prices. This is useful if you are more concerned about long-term trends rather than short-term ones.
If we set the size of the moving average window to 24, then we will use the last 24 values to forecast the next time step. Repeat this step by moving the window one step forward, where the previous forecast will be included in the average. Keep repeating until the desired forecast length is achieved. Using a window of 24, the moving average forecast for year 2019 is shown in the figure below as the blue line. MSE for wheat price data is much better at 5.84.
This method uses the most recent value as the forecasted value for the next time step. The assumption followed by this method is that its value tomorrow is equal to its value today. To do this using the wheat prices data, we simply shift the 2019 prices one time step forward and use that as the forecast prices. Notice that the blue line leads the orange line by a single time step in the figure below. For many economic and financial time series, this method works outstandingly well. MSE for wheat data is even better at 4.
In most cases and from what we have seen in the demonstration above, Naïve method performs best among the four methods discussed and is commonly used as the baseline for forecasting tasks. These methods appear to be very simple, but such elementary baselines are sometimes hard to beat.
Though one have the choice to directly select any of the baseline methods discussed above based on the characteristics of the data being analyzed, it is a good practice to exhaust all known methods so long as the resources permit. For example, if you use deep learning for forecasting tasks, an ARIMA model might be a good choice for a baseline as it can give even better results than any of the methods discussed in this article.
 F. Chollet, Deep Learning with Python (2018), Manning Publications Inc.
 R.J. Hyndman and G. Athanasopoulos, (2018) Forecasting: principles and practice, 2nd edition (2018), OTexts: Melbourne, Australia. OTexts.com/fpp2