Analytics Vidhya
Published in

Analytics Vidhya

AI for Trading Series №4: Time Series Modelling

Learn about advanced methods for time series analysis including ARMA, ARIMA.

Photo by Isaac Smith on Unsplash

In this series, we will cover the following ways to perform time-series analysis-

  1. Random Walk
  2. Moving Averages Model (MA Model)
  3. Autoregression Model (AR Model)
  4. Autoregressive Moving Averages Model (ARMA Model)
  5. Autoregressive Integrated Moving Averages (ARIMA Model)

Random Walk Model

The random walk hypothesis is a financial theory stating that stock market prices evolve according to a random walk and thus cannot be predicted. A Random Walk Model beleives that [1]:

  1. Changes in stock prices have the same distribution and are independent of each other.
  2. Past movement or trend of a stock price or market cannot be used to predict its future movement.
  3. It’s impossible to outperform the market without assuming additional risk.
  4. Considers technical analysis undependable because it results in chartists only buying or selling a security after a move has occurred.
  5. Considers fundamental analysis undependable due to the often-poor quality of information collected and its ability to be misinterpreted.

A random walk model can be expressed as :

Random Walk Equation

This formula represents that location at the present time t is the sum of the previous location and noise, expressed by Z.

Simulating Returns with Random Walk

  1. Importing libraries

Here, we are importing important libraries needed for visualization and also for simulating random walk model.

from import plot_acf
from statsmodels.tsa.stattools import acf

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

plt.rcParams['figure.figsize'] = (14, 8)

Now we generate 1000 random points by adding a degree of randomness to the previous point to generate the next point with 0 as starting point.

# Draw samples from a standard Normal distribution (mean=0, stdev=1).
points = np.random.standard_normal(1000)

# making starting point as 0

# Return the cumulative sum of the elements along a given axis.
random_walk = np.cumsum(points)
random_walk_series = pd.Series(random_walk)

2. Plotting the simulated random walk

Now, lets plot our dataset.

plt.figure(figsize=[10, 7.5]); # Set dimensions for figure
plt.title("Simulated Random Walk")
Simulated Random Walk

3. Autocorrelational Plots

An autocorrelation plot is designed to show whether the elements of a time series are positively correlated, negatively correlated, or independent of each other.An autocorrelation plot shows the value of the autocorrelation function (acf) on the vertical axis. It can range from –1 to 1.

We can calculate the correlation for time series observations with observations with previous time steps, called lags. Because the correlation of the time series observations is calculated with values of the same series at previous times, this is called a serial correlation, or an autocorrelation.

A plot of the autocorrelation of a time series by lag is called the AutoCorrelation Function, or the acronym ACF. This plot is sometimes called a correlogram or an autocorrelation plot.

random_walk_acf = acf(random_walk)
acf_plot = plot_acf(random_walk_acf, lags=20)
Autocorrelational Plot

Looking at the corelation plot we can say that the process is not stationary. But there is a way to remove this trend. I am going to try to different ways to make this process a stationary one -

  1. Knowing that a random walk adds a random noise to the previous point, if we take the difference between each point with its previous one, we should obtain a purely random stochastic process.
  2. Taking the log return of the prices.

4. Difference between the 2 points

random_walk_difference = np.diff(random_walk, n=1)

plt.figure(figsize=[10, 7.5]); # Set dimensions for figure
cof_plot_difference = plot_acf(random_walk_difference, lags=20);

We see that this is the correlogram of a purely random process, where the autocorrelation coefficients drop at lag 1.

Moving Average Model (MA Models)

In MA models, we start with average mu, to get the value at time t, we add a linear combination of residuals from previous time stamps. In finance, residual refers to new unpredictable information that can’t be captured by past data points. The residuals are difference between model’s past prediction and actual values.

Moving average models are defined as MA(q) where q is the lag.

Representation of Moving Average Model with lag ‘q’; (Source: AI for Trading nano degree course on Udacity)

Taking an example of MA model of order 3, denoted as MA(3):

Representation of Moving Average Model with lag=3; MA(3)

The equation above says that the position y at time t depends on the noise at time t, plus the noise at time t-1 (with a certain weight epsilon), plus some noise at time t-2 (with a certain weights), plus some noise at time t-3.

from statsmodels.tsa.arima_process import ArmaProcess

# start by specifying the lag
ar3 = np.array([3])

# specify the weights : [1, 0.9, 0.3, -0.2]
ma3 = np.array([1, 0.9, 0.3, -0.2])

# simulate the process and generate 1000 data points
MA_3_process = ArmaProcess(ar3, ma3).generate_sample(nsample=1000)
plt.figure(figsize=[10, 7.5]); # Set dimensions for figure
plt.title('Simulation of MA(3) Model')
plot_acf(MA_3_process, lags=20);

As you can see, there is a significant correlation upto lag 3. Afterwards, the correlation is not significant anymore. This makes sense since we specified a formula with a lag of 3.

Autoregression Models (AR Models)

An auto-regressive models (AR Models) tries to fit in a line that is linear combination of previous values. It includes an intercept, that is indipendent of previous values. It also contains error term to represent movements that cannot be predicted by previous terms.

AR Models (Source: AI for Trading nano degree course on Udacity)

An AR model is defined by its lag. If an AR model uses only yesterday’s value and ignores the rest, its called AR Lag 1, if the model uses two previous days values and ignores the rest, its called AR Lag 2 and so on.

AR Lag (Source: AI for Trading nano degree course on Udacity)

Usually, autoregressive models are applied to stationary time series only. This constrains the range of the parameters phi. For example, an AR(1) model will constrain phi between -1 and 1. Those constraints become more complex as the order of the model increases, but they are automatically considered when modelling in Python.

Simulating return series with autoregressive properties

For simulating a AR(3) process, we will be using ArmaProcess.

For this, let us take the same example that we used to simulate random walk model:

Representation of MA(3) Model

Since we are dealing with an autoregressive model of order 3, we need to define the coefficient at lag 0, 1, 2 and 3. Also, we will cancel the effect of a moving average process. Finally, we will generate 10000 data points.

ar3 = np.array([1, 0.9, 0.3, -0.2])
ma = np.array([3])
simulated_ar3_points = ArmaProcess(ar3, ma).generate_sample(nsample=10000)
plt.figure(figsize=[10, 7.5]); # Set dimensions for figure
plt.title("Simulation of AR(3) Process")

Looking at the correlation plot, we can see that the coefficient is slowly decaying. Now lets plot the corresponding partial correlation plot.

Partial Autocorrelation Plot

The autocorrelation for an observation and an observation at a prior time step is comprised of both the direct correlation and indirect correlations. These indirect correlations are a linear function of the correlation of the observation, with observations at intervening time steps.

It is these indirect correlations that the partial autocorrelation function seeks to remove.

from import plot_pacf


As you can see the coefficients are not significant after lag 3. Therefore, the partial autocorrelation plot is useful to determine the order of an AR(p) process. You can also view these values using the import statement from statsmodels.tsa.stattools import pacf

from statsmodels.tsa.stattools import pacf

pacf_coef_AR3 = pacf(simulated_ar3_points)

Auto Regressive Moving Average Model (ARMA)

The ARMA model is defined with a p and q. p is the lag for autoregression and q is lag for moving average. Regression based training models require data to be stationary. For a non-stationary dataset, the mean, variance and co-variance may change over time. This causes difficulty in predicting future based on past.

Looking back at the equation of Autoregressive Model (AR Model) :

AR Model. (Source: AI for Trading nano degree course on Udacity)

Looking at the equation of Moving Average Model (MA Model) :

MA Model. (Source: AI for Trading nano degree course on Udacity)

Equation of ARMA model is simply the combination of the two :

ARMA Model

Hence, this model can explain the relationship of a time series with both random noise (moving average part) and itself at a previous step (autoregressive part).

Simulating ARMA(1, 1) Process

Here, we will be simulating an ARMA(1, 1) model whose equation is :

ar1 = np.array([1, 0.6])
ma1 = np.array([1, -0.2])
simulated_ARMA_1_1_points = ArmaProcess(ar1, ma1).generate_sample(nsample=10000)
plt.figure(figsize=[15, 7.5]); # Set dimensions for figure
plt.title("Simulated ARMA(1,1) Process")
plt.xlim([0, 200])

As you can see, both plots exhibit the same sinusoidal trend, which further supports the fact that both an AR(p) process and a MA(q) process is in play.

Autoregressive Integrated Moving Average (ARIMA)

This model is the combination of autoregression, a moving average model and differencing. In this context, integration is the opposite of differentiation.

Differentiation is useful to remove the trend in a time series and make it stationary.
It simply involves subtracting a point a t-1 from time t.

Mathematically, the ARIMA(p,d,q) now requires three parameters:

  1. p: the order of the autoregressive process
  2. d: the degree of differentiation (number of times it was differenced)
  3. q: the order of the moving average process

The equation can be expressed as follows:

Representation of ARIMA model

ar_params = np.array([1, -0.4])
ma_params = np.array([1, -0.8])

returns = ArmaProcess(ar_params, ma_params).generate_sample(nsample=1000)

returns = pd.Series(returns)
drift = 100

price = pd.Series(np.cumsum(returns)) + drift
returns.plot(figsize=(15,6), color=sns.xkcd_rgb["orange"], title="simulated return series")
price.plot(figsize=(15,6), color=sns.xkcd_rgb["baby blue"], title="simulated price series")

Extracting Stationary Data

One way to get stationary time-series is by taking difference between points in time-series. This time difference is called rate of change.

rate_of_change = current_price / previous_price

The corresponding log return will become :

log_returns = log(current_price) - log(previous_price)

log_return = np.log(price) - np.log(price.shift(1))
log_return = log_return[1:]
_ = plot_acf(log_return,lags=10, title='log return autocorrelation')
_ = plot_pacf(log_return, lags=10, title='log return Partial Autocorrelation', color=sns.xkcd_rgb["crimson"])
from statsmodels.tsa.arima_model import ARIMA

def fit_arima(log_returns):
ar_lag_p = 1
ma_lag_q = 1
degree_of_differentiation_d = 0

# create tuple : (p, d, q)
order = (ar_lag_p, degree_of_differentiation_d, ma_lag_q)

# create an ARIMA model object, passing in the values of the lret pandas series,
# and the tuple containing the (p,d,q) order arguments
arima_model = ARIMA(log_returns.values, order=order)
arima_result =

#TODO: from the result of calling,
# save and return the fitted values, autoregression parameters, and moving average parameters
fittedvalues = arima_result.fittedvalues
arparams = arima_result.arparams
maparams = arima_result.maparams

return fittedvalues,arparams,maparams
fittedvalues,arparams,maparams = fit_arima(log_return)
arima_pred = pd.Series(fittedvalues)
plt.plot(log_return, color=sns.xkcd_rgb["pale purple"])
plt.plot(arima_pred, color=sns.xkcd_rgb["jade green"])
plt.title('Log Returns and predictions using an ARIMA(p=1,d=1,q=1) model');
print(f"fitted AR parameter {arparams[0]:.2f}, MA parameter {maparams[0]:.2f}")




Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Recommended from Medium

What is the difference between Linear Regression and Non-Linear Regression?

Summer internship at Krsikx

Analyzing Chess with Pandas to Learn from the Best and Raise My Rating.

Neo4j graph database and Gephi tool.

Your checklist for selecting the perfect Business Intelligence platform for your Business

Data Orchestration — A Primer

IBM Cloud Pak for Data — using Decision Optimization

Analytics Overview

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Purva Singh

Purva Singh

Hi! I am a tech enthusiast currently working on leveraging language technologies to solve financial use-cases! View my work here:

More from Medium

Will the gold price rise again? -> Time Series Analysis with KNIME and Python

The Quantitative Analytics Lab

Applying Data Science to Stock Investments: Data Clustering for S&P500 stock data

How to Track Indicators for Inflation