Analytics Vidhya
Published in

Analytics Vidhya

Univariate Forecasting for the Volatility of the Stock Data using Deep Learning

Yesterday we saw how the volatility of the Tesla Inc Stock can be predicted using a basic regression approach.(Link). In this article, will go one step ahead to see how the volatility can be predicted as a time series method for a period of time using Deep Learning Model(LSTM) approach.

Volatility over a period of time

Time Series Forecasting

Time series forecasting is a method to predict the future using the previous values, whereas in yesterdays regression approach the model will not use any previous days volatility values to predict the target variable. Is it confusing? Will try to understand based on a simple example. Say you have a data of tesla’s stock volatility for a period of time and volatility is predicted using particular day’s start and end price alone, then it is called as regression approach. Where as in time series analysis the volatility of a particular day is predicted using previous n days of volatility values.

Data Preparation

First and foremost important step in time series forecasting is preparation of data. Before starting we have to ask few questions about the data.

  • Deep Learning Model needs a standard input size so how many days of volatility you are going to predict the future? It can be last 5 days, 10 days or any. This will decide your input feature size.
  • Is your data stable? — Using Augmented Dickey-Fuller test
  • Is the data skewed? — if it is skewed you can apply log transformation to your data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
window_size=20
#first download data from yahoo
import yfinance as yf
from yahoofinancials import YahooFinancials
df = yf.download(‘TSLA’, start=’2000–01–01', end=’2019–12–31', progress=False)
#compute daily returns and 20 day moving historical volatility
df[‘returns’]=df[‘Close’].pct_change()
df[‘volatility’]=df[‘returns’].rolling(window_size).std()*(252**0.5)
X = df['volatility'].values
plt.figure(figsize=(20,5))
plt.subplot(121)
plt.hist(X)
plt.title('Distribution of Volatility')
plt.subplot(122)
plt.hist(np.log(X))
plt.title('Log Transformation of Volatility')
plt.show()
Volatility

Stability Check

from statsmodels.tsa.stattools import adfuller
result = adfuller(X)
print(‘ADF Statistic: %f’ % result[0])
print(‘p-value: %f’ % result[1])
print(‘Critical Values:’)
for key, value in result[4].items():
print(‘\t%s: %.3f’ % (key, value))
ADF Statistic: -4.915287
p-value: 0.000033
Critical Values:
1%: -3.433
5%: -2.863
10%: -2.567
Stability Result

Since the p-values is less than 5%, the data is stationary, means the volatility is not dependent on trend/time.

Input feature

Here we are going to predict the volatility using last 10 days of data. So our input feature will be (N,10).

def convert2matrix(data_arr, look_back):
X, Y =[], []
for i in range(len(data_arr)-look_back):
d=i+look_back
X.append(data_arr[i:d])
Y.append(data_arr[d])
return np.array(X), np.array(Y)
x, y = convert2matrix(X,10)
print(x.shape,y.shape)

Training Phase

I am using LSTM model to train the model, because LSTM has the capability to remember the interrelation between input feature for a longer time period.

from sklearn.model_selection import train_test_split
X,x_valid,y,y_valid = train_test_split(x,y,test_size=0.1,shuffle=False)
X = np.expand_dims(X,2)
x_valid = np.expand_dims(x_valid,2)
print(X.shape,y.shape,x_valid.shape,y_valid.shape)

(2126,10,1),(2126),(237,10,1),(237)

from keras.models import Sequential
from keras.layers import Dense, LSTM
from keras.optimizers import Adam
model = Sequential()
model.add(LSTM(10, input_shape=(X.shape[1],1)))
model.add(Dense(1))
model.summary()
model.compile(loss=’mae’, optimizer=Adam(0.001))
model.fit(X, y, epochs=10, batch_size=32, validation_data=(x_valid, y_valid), verbose=1, shuffle=False)
from sklearn.metrics import mean_absolute_error
pred = model.predict(x_valid)
plt.figure(figsize=(20,5),dpi=300)
plt.plot(y_valid,label=’True Value’)
plt.plot(pred,linestyle=’ — ‘, label = ‘Predicted’)
plt.legend()
plt.show()

Oh Yeah!! LSTM was better that regression model for this data. The model is able is capture the trend. Fine tuning the model will lead to better result.

May be adding more features will also improve the prediction accuracy. Will try to convert the same data to multivariate tomorrow and see what happens there. Cant wait to see the difference.

Clap and Share if you find this useful :-)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store