Introduction to Forecasting in Python

Lokesh Saini
May 30, 2020 · 7 min read

Time series forecasting using AR, MA and ARIMA models.

Bojo Beach — Accra, Ghana

“If you can look into the seeds of time, and say which grain will grow and which will not, speak then unto me. “ — William Shakespeare


The time-series data Yt is a random variable, usually collected at regular time intervals and in chronological order. If the time-series data contains observations of just a single variable (such as demand of a product at time t), then it is termed as univariate time-series data. If the data consists of more than one variable, for example, demand for a product at time t, price at time t, amount of money spent by the company on promotion at time t, competitors’ price at time t, etc., then it is called multivariate time- series data.

A time-series data can be broken into the four following components:

1. Trend Component (Tt): Trend is the consistent long-term upward or downward movement of the data. T

2. Seasonal Component (St): Seasonal component (measured using seasonality index) is the repetitive upward or downward movement (or fluctuations) from the trend that occurs within a calendar year at fixed intervals (i.e., time between seasons is fixed) such as seasons, quarters, months, days of the week, etc.

3. Cyclical Component (Ct):Cyclical component is fluctuation around the trend line at random interval (i.e., the time between cycles is random) that happens due to macro-economic changes such as recession, unemployment, etc. Cyclical fluctuations have repetitive patterns with time between repetitions of more than a year. A major difference between seasonal fluctuation and cyclical fluctuation is that seasonal fluctuation occurs at fixed period within a calendar year, whereas cyclical fluctuations have random time between fluctuations. That is, the periodicity of seasonal fluctuations is constant, whereas the periodicity of cyclical fluctuations is not constant.

4.Irregular Component (It): Irregular component is the white noise or random uncorrelated changes that follow a normal distribution with mean value of 0 and constant variance.

Moving Average

import pandas as pd
wsb_df = pd.read_csv(‘wsb.csv’)

Pandas has a function rolling() which can be used with an aggregate function like mean() for calculating moving average for a time window.

wsb_df[‘mavg_12’] = wsb_df[‘Sale Quantity’].rolling(window = 12). mean().shift(1)

To display values up to 2 decimal points, we can use pd.set_option and floating format %.2f.Now print the forecasted value from the month 37 onwards

pd.set_option(‘display.float_format’, lambda x: ‘%.2f’ % x) wsb_df[[‘Sale Quantity’, ‘mavg_12’]][36:]

Predicted values from moving average forecasting

plt.plot(wsb_df[‘Sale Quantity’][12:]);
plt.plot(wsb_df[‘mavg_12’][12:], ‘.’);

Mean Absolute Percentage Error

Mean absolute percentage error (MAPE) is the average of absolute percentage error. Since MAPE is dimensionless, it can be used for comparing different models with varying scales. Let’s define a method for MAPE:

import numpy as np
get_mape(actual, predicted):
y_true, y_pred = np.array(actual), np.array(predicted)
return np.round( np.mean(np.abs((actual - predicted) / actual))

Let’s invoke the above method using above DataFrame. “Sale Quantity” is passed as actual and mavg_12 is passed as predicted values. Records from the 37th month are used to calculate MAPE.

get_mape( wsb_df[‘Sale Quantity’][36:].values, wsb_df[‘mavg_12’][36:].values)

The MAPE in this case is 14.04. So, forecasting using moving average gives us a MAPE of 14.04%.


The time-series data can be modelled as addition or product of trend, seasonality, cyclical, and irregular components. The additive time-series model is given by

Yt =Tt +St +Ct +It

The additive models assume that the seasonal and cyclical components are independent of the trend component. Additive models are not very common, since in many cases the seasonal component may not be independent of the trend.The multiplicative time-series model is given by
Yt =Tt ×St ×Ct ×It

Multiplicative models are more common and are a better fit for many datasets.

For decomposing a time-series data, we can leverage the following libraries:

  1. statsmodel.tsa provides various features for time-series analysis.
  2. seasonal_decompose() in statsmodel.tsa.seasonal decomposes a time series into trend, seasonal, and residuals. It takes frequency parameters; for example, the frequency is 12 for monthly data.
from statsmodels.tsa.seasonal import seasonal_decomposets_decompse = seasonal_decompose(np.array(wsb_df[‘Sale Quantity’]), model=’multiplicative’,freq = 12)## Plotting the deocomposed time series componentsts_plot = ts_decompose.plot()

To capture the seasonal and trend components after time-series decomposition we use the following code. The information can be read from two variables seasonal and trend in ts_decompose.

wsb_df[‘seasonal’] = ts_decompse.seasonal 
wsb_df[‘trend’] = ts_decompse.trend


ACF: is a bar chart of the coefficients of correlation between a time series and lags of itself. It helps determine the value of p or the AR term. To plot the autocorrelation plot use:

PACF: a plot of the partial correlation coefficients between the series and lags of itself. Helps determine the value of q or the MA term. To plot the partial autocorrelation plot use:

Auto-regressive moving average (ARMA) is a combination auto-regressive and moving average process. ARMA(p, q) process combines AR(p) and MA(q) processes.The values of p and q in an ARMA process can be identified using the following thumb rules:

  1. Auto-correlation values are significant for first q values (first q lags) and cuts off to zero.
  2. Partial auto-correlation values are significant for first p values and cuts off to zero.

ARMA models can be used only when the time-series data is stationary. ARIMA models are used when the time-series data is non-stationary. Time-series data is called stationary if the mean, variance, and covariance are constant over time. ARIMA has the following three components and is represented as ARIMA (p, d, q) where d is the Integration component. Let’s take an example of store data:

store_df = pd.read_excel(‘store.xls’)

The dataset contains 115 days of demand per day data. We can convert the column into DateTime index, which is a default input to time-series models. Creating a DataTime index will make sure that the data is ordered by date or time. Finally, plot the demand against the date using plot() method:

store_df.set_index(pd.to_datetime(store_df.Date),inplace=True) store_df.drop(‘Date’, axis = 1, inplace = True)
plt.figure( figsize=(10,4)) 
plt.xlabel( “Date” )
plt.ylabel(“Demand” )
plt.plot( store_df.demand );

Dicky−Fuller Test

from statsmodels.tsa.stattools import adfuller
def adfuller_test(ts):
adfuller_result = adfuller(ts, autolag=None)
adfuller_out = pd.Series(adfuller_result[0:4],
index=[‘Test Statistic’, ‘p-value’,
‘Lags Used’,
‘Number of Observations Used’])

The p-value (>0.05) indicates that we cannot reject the null hypothesis and hence, the series is not stationary. Differencing the original time series is an usual approach for converting the non-stationary process into a stationary process (called difference stationarity).

store_df[‘demand_diff’] = store_df.demand - store_df.demand.shift(1)    store_df.head(5)

The differencing for the first day is nan as we do not have previous days’ demand. We can drop this value from the observation before verifying stationarity.

store_diff_df = store_df.dropna()
plt.ylabel(“First Order Differences”)
plt.plot( store_diff_df.demand_diff);

Again, it is not very apparent from the trend if the series is stationary or not. We will need to plot the ACF plot to verify:

acf_plot = plot_acf(store_df.demand_diff.dropna(), lags=10)

It immediately cuts off to zero. We can build the model with first 100 observations as training set and subsequent observations as test set. Here we choose p,d,q using above thumb rules

store_train = store_df[0:100] 
store_test = store_df[100:]
arima = ARIMA(store_train.demand.astype(np.float64).as_matrix(), order = (1,1,1))
arima_model =

The forecast() method takes the number of future periods (steps) for predicting values and returns the predicted values along with standard error and confidence interval of the predictions.

store_predict, stderr, ci = arima_model.forecast(steps = 15) store_predict
get_mape(store_df.demand[100:], store_predict)

24.17: Forecast accuracy of the ARIMA model with first-order differencing.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Lokesh Saini

Written by

Passionate about Deep Learning, Football and Life.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Lokesh Saini

Written by

Passionate about Deep Learning, Football and Life.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store