Financial Time Series and Its Random Walk in R — ARIMA Model

Cristiane Silva
Analytics Vidhya
Published in
7 min readSep 16, 2020

See what are the most important statistical characteristics of a financial time series for forecast analysis.

Stock market prices are highly unpredictable and volatile and so its empirical time series also has an element of uncertainty. There is no need to know the exact stock values of the future, but we can model its stock price movements.

In this post, I’ll show a time series modeling of a stock price using the ARIMA model , in R.

The main topics covered in this post are:

  • ACF and ADF test
  • Test of stationarity
  • ARIMA model and Ljung–Box test
  • Forecast of model

The most important statistical characteristics of a financial time series for forecast analysis are:

  • Time trends — Do the prices are on upward or downward trends?
  • No reversion to mean prices
  • Apparent constancy of the rate of change
  • Variable volatility over time
Left: S&P 500 prices — Time series with trend ; Right: S&P 500 Returns of prices — Stationary time series

The left above figure it’s an example of a financial time series with a trend ( a random walk) and shows the behavior of the stock prices over time. It was made a differentiating on this series and it was turned into the right above. The right figure is the price return of the left one and is an example of a stationary time series ( no a random walk).

Many financial time series can be represented by a model called a random walk with drift.

Time Series Modeling

First, it was loaded all the following packages if you already have them on your R. Otherwise install them using the function install.packages() for each one.


#load packages
library(IRdisplay)
library(magrittr)
library(tidyverse)
library(scales)
library(gridExtra)
library(forecast)
library(tseries)
library(ggthemes)

Then, the nflx file was loaded. That contains Netflix’s stock price from 01/01/2013 to 01/01/2018. The data used in all models correspond to the Adj_Close column, whose stock’s values are adjusted after accounting for any corporate actions.

nflx <- read_csv("NFLX.csv")
head(nflx)

The function summary allows knowing some statistics about the values.

summary(nflx$Adj_Close)

The dim() function sets the dimension of an object. The data frame has 1259 rows and 7 columns.

dim(nflx)

Is a random walk stationary?

Most stock prices follow a random walk with a drift because it is characterized by a sequence of upward or downward trends and it takes unforeseen changes in direction. The price of a stock has the following expression:

yt = price at present time

yt-1 = price at a previous time

α = E(yt − yt-1). It represents the time trend of the log price yt and it is the drift of the model.

ε = white noise (zero-mean random number)

Today’s price is yesterday’s price, plus some random noise.

The next block shows the code to plot the prices , their trend line, and with that we can check its stationarity. 􏰨 􏰛􏰙􏰯􏰩 􏰧􏰩􏰘􏰙􏰩􏰧 􏰡􏰨􏰡􏰦􏰗􏰧􏰙􏰧􏰉 􏰛􏰑􏰙􏰧 􏰡􏰧􏰧􏰤􏰯􏰖􏰛􏰙􏰕􏰨 􏰙􏰧 􏰣􏰨􏰕􏰄􏰨 􏰡􏰧 􏰧􏰛􏰡􏰛􏰙􏰕􏰨􏰡􏰘􏰙􏰛􏰗􏰉 􏰄􏰑􏰙􏰢􏰑 􏰘􏰩􏰱􏰤􏰙􏰘􏰩􏰧 􏰛􏰑􏰡􏰛 􏰛􏰑􏰩 􏰙􏰨􏰛􏰩􏰘􏰨􏰡􏰦 􏰧􏰛􏰘􏰤􏰢􏰛􏰤􏰘􏰩􏰧 􏰕􏰮 􏰛􏰑􏰩 􏰧􏰩􏰘􏰙􏰩􏰧 􏰫􏰕 􏰨􏰕􏰛 􏰢􏰑􏰡􏰨􏰚􏰩 􏰕􏰪􏰩􏰘 􏰛􏰙􏰯􏰩􏰬

#check time series plotggplot(nflx, aes(Date, Adj_Close)) + geom_line(colour='blue') + ggtitle("Netflix Closing Stock Prices")+ geom_smooth(method = "lm", formula = y ~ x, colour='black', linetype = "dashed")
Closing Stock Price — Series Trend

As it was said before, stock prices are difficult to predict. The chart above shows the time series of Netflix prices (blue line) and its trend (dashed black line). The trend is linearly increasing, the series presents values above and below the trend line and does not have the property of reversion to average. Therefore, the series is non-stationary with drift, with a positive slope α (drift).

Stationarity

A strictly stationary time series is one for which the probabilistic behavior of every collection of values is identical to that of the time-shifted set.

Weakly stationarity time series models have constant mean over time, and also the variance and autocorrelation do not change over time. We will use the term stationary to mean weakly stationary;

Let’s check the autocorrelation the prices (a random walk), that’s mean to check if there is a correlation between the present stock price and its price at different times. For this, we’ll use the ggAcf() function.

# check ACF plot
ggAcf(nflx$Adj_Close, type='correlation')

The Auto Correlation Function (ACF) values decrease along with the lags, not decay quickly, and show significant correlation across lags(the autocorrelation for previous time remains almost the same). This is typical of a time series containing a trend.

ADF Test

Augmented Dickey-Fuller test checks the null hypothesis of non-stationarity. The alternative hypothesis is stationarity.

#run ADF test
adf.test(nflx$Adj_Close)

With the ADF test, the null hypothesis is that the series follows a random walk. Therefore, a low p-value 0.432 ( greater than 0.05) means that we cannot reject the null hypothesis that the series is non-stationary (a random walk).

Why choose ARIMA?

ARIMA stands for Auto-Regressive Integrated Moving Average of order (p, d, q), where:

  • p — number of autoregressive lags
  • d — order of differencing
  • q — number of moving average lags

Our time series has characteristics of non-stationarity (trend), then it’s necessary to make a differentiating to produce a new series that is more compatible with the assumption of stationarity.

The autoregressive terms correspond to lags of the transformed series (p)(that is, stationary series obtained by differentiation) and the moving averages to lags of random errors (q). The term integrated refers to the process of differentiating the original series to make it stationary (d).
In R, the auto.arima() function returns the best ARIMA model according to either AIC(Akaike information criterion), AICc (AICC for small sample sizes), or BIC(Bayesian Information Criterion).

# fit ARIMA model
arima.model <- auto.arima(nflx$Adj_Close, allowdrift = T)
arima.model

The ARIMA method has determined that the most indicated values for p and q are 0 — ARIMA (0, 1, 0). However, the auto.arima () function uses max.p = 5 and max.q = 5, so there are likely other values for p and q that are more appropriate.

Residuals of the Model Fit

The resid() function extracts model residuals from objects returned by modeling functions.

# calculate residuals of each model

arima.residual <- resid(arima.model)
# plot PACF of each models residuals

ggAcf(arima.residual, type = 'partial')

The dashed blue lines indicate where the autocorrelation is significantly different from zero, that is, there is no autocorrelation. PACF graph measures a correlation of value with another that was found in four previous periods.

Ljung–Box test

The Ljung–Box test may be defined null hypothesis that the data are independently distributed ( i.e, there is no autocorrelation)

#run the Ljung Box test on the residuals

Box.test(arima.residual, type='Ljung-Box', lag=1)

A Ljung-Box test shows the residuals are uncorrelated.

Forecast

The forecast() function will predict the trend of prices travelling on the next 60 days.

The number of periods to forecast ahead = 60 and the confidence interval width = 90

# make forecast for model

arima.forecast <- forecast(arima.model, h=60, level= 90)
# plot forecast for model

g1 <- autoplot(arima.forecast)
g1

The blue band shows the desired forecast. Stochastic trends have much wider prediction intervals because the errors are non-stationary.

forecast(arima.model, h=60, level=90)

Above, we can find the first 10 values out of a total of 60 of the forecast obtained. The forecast starts from line 1260, continuing the values in the original dataframe with 1259 rows.

Here I finish the example of this post. You can find the code here.

--

--

Cristiane Silva
Analytics Vidhya

Engineer, MBA in Finance & Investment, and Data Scientist who contributes code to the community. linkedin.com/in/ssilvacris/