An Introduction to Time Series Forecasting

KTH AI Society
KTH AI Society
Published in
8 min readJun 28, 2022


Time Series Analysis: In comparison to traditional machine learning

What is a time series? Many of the real world data are time series data, which comprises an important time axis. A time series data can be multivariate or uni-variate, and the time t can be continuous, such as the time series of electric signals and voltages, or discrete, such as the daily closing price of various stocks or the total monthly sales of various products at the end of each month. In practice, we usually take observations only at digitized discrete time points for analysis, even the time series is continuous in nature. [1]

A major difference between time series and static data is existence of correlation between past, present and future values. With this important attribute in mind, different methods from traditional machine learning of missing data imputation, outliers detection and cross-validation shall be implemented. For example, missing values in static data can be filled with mean, or based on kNN algorithm, whereas in time series it makes no sense to fill the missing values by taking clues from the future. A common method for time series data to fill missing values and de-noise is the moving average method. It is also not reasonable to validate past time series prediction by future values, which is a constrain in designing accuracy assessment technique. A cross-validation technique for 1-step forecasting with time series data is shown as following:

blue observations: the training sets, orange observations: the test sets [2]

In the next sections, we will illustrate the decomposition and features of time series, followed by an explanation and demonstration on the widely-used ARIMA algorithm.

Time Series Decomposition

We can extract specific features from time series for further analysis, such as classification and interpolation. The features come from the time axis can be coupled with other general features in the data.

Let’s start with time series decomposition. A commonly used decomposition refers to trend, seasonality, cycle, residuals.

  • Trend (Tₜ): it reflects the long-term progression of the series. A trend exists when there is a persistent increasing or decreasing direction in the data, and does not have to be linear.
  • Cycle (Cₜ): it reflects repeated but non-periodic fluctuations.
  • Seasonality (Sₜ): it occurs over a fixed and known period (e.g., the quarter of the year, the month, or day of the week).
  • Residuals (Rₜ): it is the remainder of the time series after the other components have been removed, often regarded as the random, irregular noises.

Classical decomposition is multiplicative or additive, formally:

There are many ways to describe the above decomposed components using features. For example, seasonality can be described by coefficients at recurring time points: e.g. ‘time of day’, ‘day of week’, ‘month of year’; trend can be described using a regressed 1st or 2nd order function with respect to time axis. More importantly, it is very useful to use the lag feature, the past values of the target series, for describing the decomposed components especially cycles. Next, we will demonstrate an algorithm called ARIMA (the autoregressive integrated moving average) which uses lags for time series forecasting. ARIMA works best for additive decomposition, and for multiplicative decomposition one may consider using logarithmic transform on time series.


ARIMA is a combination of Autoregression (AR), Integrated (I), Moving average (MA).

  • Autoregression (AR): forecast the variable of interest using a linear combination of past values of the variable.
  • Integrated (I): apply differencing of raw observations to allow for the time series to become stationary. Stationary means statistical properties do not depend on the time at which the series is observed. (i.e. time series with trends or seasonality are not stationary)
  • Moving average (MA): uses past forecast errors in a regression-like model.

The full model can be written as follows (where y’ₜ is the differenced series):


The parameters of ARIMA model are defined as follows:

  • The lag order (p): the number of lag observations included in the model.
  • The degree of differencing (d): the number of times that the raw observations are differenced.
  • The order of the moving average (q): the size of the moving average window.

Finding the degree of differencing

The degree of differencing (d) should be determined first to make the time series stationary. One way to determine this parameter is applying the ADF test to the differenced series. The null hypothesis of the ADF test refers to the time series is non-stationary. So, if the p-value of the test is less than a significance level (0.05) then the null hypothesis is rejected and the differenced time series is indeed stationary.

Finding the lag order

Generally it is not useful to include every lag in the model, and the number of lags to include should be decided upon serial dependence, which can be assessed by inspecting the Partial Autocorrelation Function (PACF) plot. What is PACF? Partial autocorrelation can be imagined as the correlation between the series and its lag, after excluding the contributions from the intermediate lags. In the autoregression equation, partial autocorrelation of lag (k) of a series is the coefficient of that lag. The number of lags that have the partial autocorrelation values above a certain significance level defines the value of the lag order.

Finding the order of the moving average

Similar to finding the lag order, the order of the moving average can be determined by Autocorrelation Function (ACF) plot. Different from PACF, ACF simply measures the correlation between the series and its lag including the effects caused by intermediate lags. The number of lags that have the autocorrelation values above a certain significance level defines the value of the moving average order.

There are various other ways to find the parameters, such as auto.arima() function in pmdarima, which uses a stepwise approach to search multiple combinations of p,d,q parameters and chooses the best model that has the least AIC. [3]

Running ARIMA with statsmodels

In Python, the statsmodels package has a great, simple implementation of ARIMA that can be easily applied for time-series forecasting.


We select a simple, small dataset from, which tracks the monthly sales of CFE specialty writing papers. This data tracks 147 months of data, so it is a relatively small dataset. In the figure below, we can also see that it is most likely not stationary, as there is a consistent increase in sales over the months.

Sales per month

Fitting ARIMA

As the data is quite small (and based on previous experiments) we can keep the order of the moving average (MA) term as 0 in this analysis. We do, however, need to identify the lag order and the degree of differencing.

Identifying the degree of differencing

First, we run the ADF test to see if the data is stationary. When running this for several metrics, as can be seen in the figure below, we find a p-value of around 0.34. As this is not less than 0.05, we cannot reject the null hypothesis, indicating that our data is not stationary.

ADF Test — No differencing

Using numpy to compute the second-order difference of the time-series we find a p-value below 0.05, meaning that this makes the data stationary. Hence, we use d=2 as our differencing value:

ADF Test — 2nd Order Differencing

Identify the Lag Order

We can use the plot_pacf function in the statsmodels library to plot the Partial Autocorrelation Function Plot, the results of which can be see in the figure below. From this plot, we can be confident that a lag order greater than 15 would not be necessary.

Partial Autocorrelation

Running ARIMA

Using statsmodels, we can run ARIMA easily by just providing our time-series and the p, d, and q parameters. In the figure below, we show the summary statsmodels produces for the ARIMA model. Nearly every parameter is deemed significant, which is a good sign. As the final few lag parameters are not significant, a way to improve and simplify the model would be to reduce the lag order from 15 to 12.

ARIMA Output with (15, 2, 0)

To identify how well this model performs, we can train a model on the first two thirds of the data and tell it to forecast the remaining data. We can do this by simply calling the forecast method on our trained ARIMA model, as outlined in the figure below.

Forecast Code

When running this, we find the result outlined in the figure below. Although the model is not perfect, it has effectively learned when drops and increases in sales are expected.


Brief Conclusions

Running ARIMA with the statsmodels package is easy! The model is also easily extendable to seasonal or more complex trends.


Yuqi Shao is a member of the KTH AI Society, MSc student in Applied and Computational Mathematics at the KTH Royal Institute of Technology. You can reach her on LinkedIn or by email at

Nathan Bosch is the Head of Education at the KTH AI Society, MSc student in Machine Learning at the KTH Royal Institute of Technology, and Master Thesis student at Ericsson. You can reach him on LinkedIn or by email at


[1] William W. S. Wei, Multivariate time series analysis and applications, 2019

[2] Rob J Hyndman, George Athanasopoulos, Forecasting: Principles and Practice,

[3] Selva Prabhakaran, ARIMA Model — Complete Guide to Time Series Forecasting in Python, 2021,



KTH AI Society
KTH AI Society

This is the official account of the KTH AI Society. We write blog posts and provide insights into all sorts of interesting topics in AI!