Time Series Forecasting — A Complete Guide

Published in

Analytics Vidhya

11 min readSep 8, 2021

In this article, I will explain the basics of Time Series Forecasting and demonstrate, how we can implement various forecasting models in Python.

Forecasting is a word we usually associate with the weather. While we listen to, or watch, the NEWS, there is always a separate segment called ‘Weather Report’ where the NEWS commentator provides us with the weather forecast information. Why is forecasting so important? Well, simply because we can make informed decisions.

Now, there are two main types of forecasting methods, namely, Qualitative Forecasting and Quantitative Forecasting.

In Qualitative Forecasting, the forecasting decisions are dependent upon expert opinions. There is no data available to study the patterns in order to make forecasting decisions. Since human decision making is involved, there is a chance of bias.

In Quantitative Forecasting, data with patterns is available and these patterns can be aptly captured with the help of computers. Hence human decision making is not involved, due to which there is no chance of human bias.

When we associate a temporal or time component to the forecast, it becomes Time Series Forecasting and the data is called as Time Series Data. In statistical terms, time series forecasting is the process of analyzing the time series data using statistics and modeling to make predictions and informed strategic decisions. It falls under Quantitative Forecasting.

Examples of Time Series Forecasting are weather forecast over next week, forecasting the closing price of a stock each day etc.

To make close to accurate forecasts, we need to collect the time series data over a period, analyse the data and then build a model which will help is make the forecast. But for this process there are certain rules to be followed which help us achieve, close to accurate results.

Granularity Rule: This rule states that, more aggregate your forecasts are, the more accurate you are in your predictions. This is because aggregated data has lesser variance and hence, lesser noise.

Frequency Rule: We need to update the data frequently in order to capture any new information available, which will make our forecasts, more accurate.

Horizon Rule: Avoid making predictions, too much into the future. Meaning we should make prediction over a short duration of time and no too much into the future. This will give more accurate forecasts.

Components of a Time Series Data

Let’s understand the meaning of each component, one by one.

Level : Any time series will have a base line. To this base line we add different components to form a complete time series. This base line is known as level.
Trend : It defines whether, over a period, time series increases or decreases. That is, it has an upward (increasing) trend or downward (decreasing) trend. For eg. the above time series has an increasing trend.
Seasonality : It defines a pattern that repeats over a period. This pattern which repeats periodically is called as seasonality. In the above graph, we can clearly see the seasonality component present.
Cyclicity : Cyclicity is also a pattern in the time series data but it repeats aperiodically, meaning it doesn’t repeat after fixed intervals.
Noise : After we extract level, trend, seasonality/cyclicity, what is left is noise. Noise is a completely random fluctuation in the data.

We get the above components when we decompose the time series. There are mainly two types of time series decomposition, namely, additive seasonal decomposition and multiplicative seasonal decomposition.

Simple way to understand this is,when the individual components of the time series at hand add up to get the original time series, it is called additive seasonal decomposition. In case, the individual components need to be multiplied to get the time series data, then it is called multiplicative seasonal decomposition. The main reason to use one type of decomposition over the other is that the residue should not have any pattern. That is, it should just be some random fluctuation.

Time Series Forecasting Python Implementation

With the help of an example we will now see, how various forecasting techniques are implemented in python and their effectiveness.

Let’s first understand the meaning of evaluation metrics that we will use to evaluate these forecasting techniques.

RMSE : Root Mean Squared Error is the square root of Mean Squared Error (MSE). MSE is nothing but a representation of how forecasted values differ from actual or true ones. We take the square root in order to avoid the negative sign as errors can be positive or negative. It is represented by the following formula :

MAPE : Mean Absolute Percentage Error is the measure of how accurate a forecast system is. It measures this accuracy as a percentage, and can be calculated as the average absolute percent error for each time period minus actual values divided by actual values. It is represented by the following formula :

Where Yactual is the true value and Ypredicted is the predicted value at that particular time. n is the number of observations.

Both RMSE and MAPE should be as low as possible.

So here is the problem statement : Global Mart is an online super giant store that has worldwide operations. It takes orders and delivers across the globe and caters to 7 different geographical market segments — (Africa, APAC (Asia Pacific), Canada, EU (European Union), EMEA(Middle East), LATAM (Latin America), US (United States)). It deals with all the major product categories — Consumer, Corporate and Home Office. We need to forecast the sales for the most consistently profitable market-segment.

Note : The code and the graphs used in the article, are present in the python file whose link is given at the end of the article.

Flow of Analysis :

1. Import the required libraries
2. Read and understand the data
3. Exploratory Data Analysis
4. Data Preparation
5. Time Series Decomposition
6. Build and Evaluate Time Series Forecast

Import the required libraries

2. Read and understand the data

Our data has 51290 rows and 5 columns and there are no missing values.

3. Exploratory Data Analysis

We do the outliers analysis of various attributes and find that there are indeed outliers present in the profit and sales columns.

In the time series data, there are observations pertaining to all time stamps and so we cannot delete the outliers, as it results in loss of data and affects it’s continuity.

We performed univariate, bivariate and multivariate analyses and here are the graphs.

From the above graphs, we can see that Canada-Consumer is the most profitable market-segment and APAC- Home Office is the leading Market-Segment combination in terms of Sales.

As per the problem statement, we need to find 21 Market Segments by combining the respective 7 geographical markets for each of the 3 product segments. We create a column Market-Segment by combining 2 columns, Market and Segment.

Train-Test Split : We divide the data such that train set contains 42 months and test set contains 6 months data.

Consistently profitable market segment: Coefficient of variation is a ratio of the standard deviation to mean. We need to find the market segment for which value of Coefficient of variation is the least for profit. This is because, less standard deviation means less variation in profit, which means more consistent are the profit figures for that region over the given period. We calculate the Coefficient of Variation for each of the 21 market segments for 42 months(train data) to decide which market segment is consistently profitable.

We find that APAC-Consumer is the market segment with least Coefficient of Variation. It means profit figures for APAC-Consumer market segment have been consistent over the train set period. Hence we choose this market segment to further calculate and predict the Sales Values.

We filter the data for APAC-Consumer market segment and group the resultant data frame by Order date to get the time series data containing Order date and Sales. We call it data1.

Time series Decomposition

Our time series data looks as follows:

We perform the additive and multiplicative seasonal decomposition as follows:

Left additive seasonal decomposition, right multiplicative seasonal decomposition

Clearly the data contains seasonal component. We build various Time Series Forecast models and compare the RMSE (Root Mean Squared Error) and MAPE (Mean Absolute Percentage Error) values for all the models. Lower values of RMSE and MAPE are desired to conclude that a model performs better. Accuracy is calculated as (100 — MAPE). Lower the MAPE value, higher is the accuracy.

We will now see various forecasting methods to forecast the sales values.

Simple Time Series forecasting methods

3 methods that fall under these are the Naive method, the Simple average method and the simple moving average method.

The Naive method simply carries forward the last observation. Simple average uses average of all observations for forecasting and Simple Moving average method uses moving averages for forecasting.

The RMSE and MAPE values are as shown below:

As we can see from the above figures, in the simple forecasting methods, Simple Moving Average method performs the best.

Exponential Smoothing Techniques

These are namely the Simple Exponential Smoothing technique, Holt’s method with trend and Holt Winter’s method.

While in simple average method, past observations are weighted equally, in the Simple exponential smoothing technique, the recent observations are given more weight than the past ones. It captures level in the data but doesn't capture trend or seasonality. The Holt’s method on the other hand can capture level and trend but not seasonality. The Holt Winter’s method can capture all level, trend and seasonality.

We conclude that the Holt Winters’ additive method in the smoothing techniques is able to forecast the sales closer to the actual values.The RMSE and MAPE values for this method are lower as compared to other model methods. This method is very well able to capture the trend and seasonality in the data.

Auto Regressive methods

In autoregressive methods, regression technique is used for forecasting the the future observations, using a linear combination of past observations. But for this the time series should follow 2 assumptions : Stationarity and Autocorrelation.

For a time series to be stationary, the mean, variance and co-variance should be constant. Autocorrelation helps us to know how a variable is influenced by its own lagged values.

There are 2 tests to confirm stationarity, as follows:

Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test :

Null Hypothesis (H 0 ): The series is stationary : p−value>0.05

2. Alternate Hypothesis (H a ): The series is not stationary : p−value≤0.05

Augmented Dickey-Fuller (ADF) Test:

Null Hypothesis (H0 ): The series is not stationary : p−value>0.05

2. Alternate Hypothesis (Ha ): The series is stationary : p−value≤0.05

We perform these tests on our time series data and conclude that the time series is not stationary. In order to make it stationary, we need to perform Differencing(making mean constant) and Transformation(making variance constant).

We perform train test split and proceed with the Auto Regressive techniques for forecasting.

Auto regression method (AR)

This method uses linear regression for predicting the future observation using one or more past observations.

Moving average method (MA)

Here future values are forecasted using past forecast errors in a regression-like model.

Auto regression moving average method (ARMA)

It’s a combination of AR and MA models.

Auto regressive integrated moving average (ARIMA)

It is same as ARMA model, just has an additional integrated differencing component in it. Earlier, we applied both the box-cox transformation and differencing to the data, in order to make the time-series data stationary. Here, we are just applying box-cox before building the model and letting the model take care of the differencing i.e. the trend component itself.

Seasonal auto regressive integrated moving average (SARIMA)

SARIMA is same as ARIMA, it just has an additional component of seasonality in it.

After implementing all the forecasting models, we calculate the RMSE and MAPE for all the methods.

We conclude that, Holt Winters’ additive method and Seasonal auto regressive integrated moving average (SARIMA) technique are the best for forecasting the sales for the data. Both the methods have lower RMSE and MAPE values and are able to capture the trend and seasonality components, well, in the data.

This completes out analysis. Hope the article was informative and easy to understand. Also, I hope you enjoyed analyzing the colorful graphs that were included in the analysis.

Do feel free to comment and give your feedback.

You can connect with me on LinkedIn: https://www.linkedin.com/in/pathakpuja/

Please visit my GitHub profile for the python codes. The code mentioned in the article, as well as the graphs, can be found here: https://github.com/pujappathak/Retail-Giant-Sales-Forecasting

References:

https://www.statisticshowto.com/probability-and-statistics/regression-analysis/rmse-root-mean-square-error/

https://www.statisticshowto.com/mean-absolute-percentage-error-mape/