Published in

Analytics Vidhya

# Univariate Forecasting for the Volatility of the Stock Data using Deep Learning

Yesterday we saw how the volatility of the Tesla Inc Stock can be predicted using a basic regression approach.(Link). In this article, will go one step ahead to see how the volatility can be predicted as a time series method for a period of time using Deep Learning Model(LSTM) approach.

# Time Series Forecasting

Time series forecasting is a method to predict the future using the previous values, whereas in yesterdays regression approach the model will not use any previous days volatility values to predict the target variable. Is it confusing? Will try to understand based on a simple example. Say you have a data of tesla’s stock volatility for a period of time and volatility is predicted using particular day’s start and end price alone, then it is called as regression approach. Where as in time series analysis the volatility of a particular day is predicted using previous n days of volatility values.

# Data Preparation

First and foremost important step in time series forecasting is preparation of data. Before starting we have to ask few questions about the data.

• Deep Learning Model needs a standard input size so how many days of volatility you are going to predict the future? It can be last 5 days, 10 days or any. This will decide your input feature size.
• Is your data stable? — Using Augmented Dickey-Fuller test
• Is the data skewed? — if it is skewed you can apply log transformation to your data
`import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snswindow_size=20#first download data from yahooimport yfinance as yffrom yahoofinancials import YahooFinancialsdf = yf.download(‘TSLA’, start=’2000–01–01', end=’2019–12–31', progress=False)#compute daily returns and 20 day moving historical volatilitydf[‘returns’]=df[‘Close’].pct_change()df[‘volatility’]=df[‘returns’].rolling(window_size).std()*(252**0.5)X = df['volatility'].valuesplt.figure(figsize=(20,5))plt.subplot(121)plt.hist(X)plt.title('Distribution of Volatility')plt.subplot(122)plt.hist(np.log(X))plt.title('Log Transformation of Volatility')plt.show()`

Stability Check

`from statsmodels.tsa.stattools import adfullerresult = adfuller(X)print(‘ADF Statistic: %f’ % result[0])print(‘p-value: %f’ % result[1])print(‘Critical Values:’)for key, value in result[4].items(): print(‘\t%s: %.3f’ % (key, value))ADF Statistic: -4.915287p-value: 0.000033Critical Values:	1%: -3.433	5%: -2.863	10%: -2.567`

Since the p-values is less than 5%, the data is stationary, means the volatility is not dependent on trend/time.

Input feature

Here we are going to predict the volatility using last 10 days of data. So our input feature will be (N,10).

`def convert2matrix(data_arr, look_back): X, Y =[], [] for i in range(len(data_arr)-look_back):  d=i+look_back   X.append(data_arr[i:d])  Y.append(data_arr[d]) return np.array(X), np.array(Y)x, y = convert2matrix(X,10)print(x.shape,y.shape)`

# Training Phase

I am using LSTM model to train the model, because LSTM has the capability to remember the interrelation between input feature for a longer time period.

`from sklearn.model_selection import train_test_splitX,x_valid,y,y_valid = train_test_split(x,y,test_size=0.1,shuffle=False)X = np.expand_dims(X,2)x_valid = np.expand_dims(x_valid,2)print(X.shape,y.shape,x_valid.shape,y_valid.shape)`

(2126,10,1),(2126),(237,10,1),(237)

`from keras.models import Sequentialfrom keras.layers import Dense, LSTMfrom keras.optimizers import Adammodel = Sequential()model.add(LSTM(10, input_shape=(X.shape[1],1)))model.add(Dense(1))model.summary()`
`model.compile(loss=’mae’, optimizer=Adam(0.001))model.fit(X, y, epochs=10, batch_size=32, validation_data=(x_valid, y_valid), verbose=1, shuffle=False)`
`from sklearn.metrics import mean_absolute_errorpred = model.predict(x_valid)plt.figure(figsize=(20,5),dpi=300)plt.plot(y_valid,label=’True Value’)plt.plot(pred,linestyle=’ — ‘, label = ‘Predicted’)plt.legend()plt.show()`

Oh Yeah!! LSTM was better that regression model for this data. The model is able is capture the trend. Fine tuning the model will lead to better result.

May be adding more features will also improve the prediction accuracy. Will try to convert the same data to multivariate tomorrow and see what happens there. Cant wait to see the difference.

Clap and Share if you find this useful :-)

--

--

## More from Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com