Univariate Forecasting for the Volatility of the Stock Data using Deep Learning
Yesterday we saw how the volatility of the Tesla Inc Stock can be predicted using a basic regression approach.(Link). In this article, will go one step ahead to see how the volatility can be predicted as a time series method for a period of time using Deep Learning Model(LSTM) approach.
Time Series Forecasting
Time series forecasting is a method to predict the future using the previous values, whereas in yesterdays regression approach the model will not use any previous days volatility values to predict the target variable. Is it confusing? Will try to understand based on a simple example. Say you have a data of tesla’s stock volatility for a period of time and volatility is predicted using particular day’s start and end price alone, then it is called as regression approach. Where as in time series analysis the volatility of a particular day is predicted using previous n days of volatility values.
First and foremost important step in time series forecasting is preparation of data. Before starting we have to ask few questions about the data.
- Deep Learning Model needs a standard input size so how many days of volatility you are going to predict the future? It can be last 5 days, 10 days or any. This will decide your input feature size.
- Is your data stable? — Using Augmented Dickey-Fuller test
- Is the data skewed? — if it is skewed you can apply log transformation to your data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#first download data from yahoo
import yfinance as yf
from yahoofinancials import YahooFinancials
df = yf.download(‘TSLA’, start=’2000–01–01', end=’2019–12–31', progress=False)
#compute daily returns and 20 day moving historical volatility
X = df['volatility'].values
plt.title('Distribution of Volatility')
plt.title('Log Transformation of Volatility')
from statsmodels.tsa.stattools import adfuller
result = adfuller(X)
print(‘ADF Statistic: %f’ % result)
print(‘p-value: %f’ % result)
for key, value in result.items():
print(‘\t%s: %.3f’ % (key, value))ADF Statistic: -4.915287
Since the p-values is less than 5%, the data is stationary, means the volatility is not dependent on trend/time.
Here we are going to predict the volatility using last 10 days of data. So our input feature will be (N,10).
def convert2matrix(data_arr, look_back):
X, Y =, 
for i in range(len(data_arr)-look_back):
return np.array(X), np.array(Y)x, y = convert2matrix(X,10)
I am using LSTM model to train the model, because LSTM has the capability to remember the interrelation between input feature for a longer time period.
from sklearn.model_selection import train_test_split
X,x_valid,y,y_valid = train_test_split(x,y,test_size=0.1,shuffle=False)X = np.expand_dims(X,2)
x_valid = np.expand_dims(x_valid,2)
from keras.models import Sequential
from keras.layers import Dense, LSTM
from keras.optimizers import Adammodel = Sequential()
model.fit(X, y, epochs=10, batch_size=32, validation_data=(x_valid, y_valid), verbose=1, shuffle=False)
from sklearn.metrics import mean_absolute_error
pred = model.predict(x_valid)plt.figure(figsize=(20,5),dpi=300)
plt.plot(pred,linestyle=’ — ‘, label = ‘Predicted’)
Oh Yeah!! LSTM was better that regression model for this data. The model is able is capture the trend. Fine tuning the model will lead to better result.
May be adding more features will also improve the prediction accuracy. Will try to convert the same data to multivariate tomorrow and see what happens there. Cant wait to see the difference.
Clap and Share if you find this useful :-)