Predicting Stock Prices Using Deep Learning Models

Josh Bernhard
The Startup
Published in
7 min readJul 12, 2020

Introduction

When you get started with machine learning, you learn to use linear regression to predict numeric values and logistic regression to predict binary variables. You then learn several other popular methods: Nearest neighbors, support vector machines, decision trees, random forests, and boosting.

Sklearn is commonly used for regression and classification problems.

Once you work your way through the common methods of sklearn(those mentioned above), a common next step is to start learning deep learning methods. One of the most popular libraries to build deep learning models is keras. In this post, I assume you have exposure to the methods found in the sklearn library, as well as comfort with Python and a working knowledge of deep learning. Exposure to keras is not expected.

Deep learning models have come to light as useful for prediction in so many cases. Deep learning models with no “fancy” layers work well for most traditional classification and regression predictions where many samples are available.

Deep Learning with no “fancy” layers

Convolutional Neural Networks (CNNs) work extremely well for prediction cases involving a number of different targets, but currently are gaining fame due to their ability to identify what’s being shown in images. When you are working with time-series data, articles will rave about the performance of Recurrent Neural Networks (RNNs) and Long-Short Term Memory(LSTMs) models. These last two methods — RNNs and LSTMs will be the focus of this article. Specifically, these methods will be used for forecasting using the keras library.

The Data

The data used for this post is time-series data from 2006–2018 for several stocks. This data can be found on Kaggle here. The goal here is to train a model on stock data from 2006 to 2016, then use that model to predict the prices for 2017.

IBM data — “High” column is used in this example

Below you can see an example of two possible examples: Apple stock and IBM stock prices. For each of these plots, the High for the day was chosen. The training data are shown in blue and the test data we would like to predict are shown in red.

Two examples of training and testing examples: Apple and IBM stock prices.

Traditional Supervised Models vs. Recurrent Models

When considering the machine learning methods listed in the introduction using sklearn, these methods generally consider an X and y, where the columns of X are intended to be used to predict the values of y.

One example of this case is in predicting Boston home prices. The X in this example is a matrix of housing data. The y is the corresponding home price. Below you can see the X data:

X data

For each row, there is a corresponding home price, which represents y. The way a model is trained is then in relating each row of X to the corresponding value of y. With time-series data, you don't have data that are all connected in this way. Instead, you will want to use trends from previous y values to assist in predictions of future y values.

In the Boston home price data notice that each row represents a home. Each of these homes is independent of each other home. However, in time-series data, the price of the stock in each of the earlier rows is useful in predicting the prices found in the following rows. To take advantage of the relationship between stock prices over time, RNN and LSTM models can be used.

An explanation of how these modeling techniques work is super well explained by one of my previous colleagues, Luis Serrano, below. You can also check out more of his amazing content at the link here.

RNN Explanation by Luis Serrano — check out more of his content here.

In keras, there are a few different methods that exist to implement a recurrent process: SimpleRNN. LSTM, and GRU.

An RNN is a for loop that reuses quantities computed during the previous iteration of the loop — Francois Chollet

Let’s put these recurrent methods to practice.

Data Preprocessing

In order to begin using these recurrent networks, we must first do some data preprocessing. Specifically, we will look at predicting the High column. Setting up this column in terms of training and test is done as:

training_set = all_data[:'2016'].iloc[:,1:2].values
test_set = all_data['2017':].iloc[:,1:2].values

Notice that the training and test data are created by pulling either prior to 2017 (training) or after (test).

We then scale this data:

sc = MinMaxScaler(feature_range=(0,1))
training_set_scaled = sc.fit_transform(training_set)

Then use the previous 60 days worth of data to assist in predicting the next day's price. Putting this all together into a function can be done as shown in the code snip here:

RNNs, LSTMs, and GRUs

Now to the fun part, we will put together a few different functions that each provide a different recurrent model for predicting future stock prices. To start we might start with a really simple single-layer recurrent network with only a small number of nodes. Below shows a 6 node, single-layer model.

Often a model can be improved by adding more nodes. This is usually a second step to improving a deep learning model. In the code below, you can see additional nodes are added up to 32 nodes, but still a single-layer model.

It is common in building these recurrent models to stack multiple layers together in order to improve their ability to predict. Models with fewer layers tend to underfit the data, while each preceding layer leads closer to overfitting.

“Recurrent layer stacking is a classic way to build more powerful recurrent networks: for instance, what currently powers the Google Translate algorithm is a stack of seven large LSTM layers” — Francois Chollet (Google Brain)

Below you can see an example of how these recurrent layers can be stacked to create a model.

Notice when you stack these network layers you need to set return_sequences=True in each of the layers leading up to the last recurrent layer. The SimpleRNN model is generally only considered useful when the most recent data point contains the necessary information for predicting the next data point.

In most cases, usingLSTM or GRU layers will outperform the SimpleRNN layers in these multilayer models, as these layers are able to use more long-term dependencies by tackling what is known as the vanishing gradient problem. These more advanced layers do a better job of retaining older information to use in future predictions.

In practice the GRU and LSTM layers frequently achieve similar results, though LSTM layers may slightly outperform. However, GRU layers are generally preferred due to being less computationally expensive. Below you can see the code for this example,

You may have heard that complex, multi-layered neural networks have a tendency to overfit, which is true. One method that is common to combat overfitting is to add dropout. Dropout is a method of randomly dropping weights in different parts of a network to reduce the opportunity of a network to overfit.

A recommended strategy for using dropout in LSTMs is to use dropout only in dense layers following LSTM layers. Though it is possible to use dropout within LSTM layers, this isn’t recommended, because in fitting to time-series data, each node may carry information that you don’t want to be dropped. In many cases, dropout of any capacity does not lead to improvements in time series models.

For illustration purposes, you can see how dropout may be added to a model. However, note the above points anytime you decide to use dropout in one of these types of recurrent models.

As a final model, we might try Facebook’s prophet model, which uses a number of traditional time series components. The documentation lists the following parameters:

  • hourly, daily, or weekly observations with at least a few months (preferably a year) of history
  • strong multiple “human-scale” seasonalities: day of week and time of year
  • important holidays that occur at irregular intervals that are known in advance (e.g. the Super Bowl)
  • a reasonable number of missing observations or large outliers
  • historical trend changes, for instance, due to product launches or logging changes
  • trends that are non-linear growth curves, where a trend hits a natural limit or saturates

I did a few runs to make the prophet model directly comparable to the recurrent models used above, but these models take a significant amount of time to run. Instead, a faster method predicts all future values, rather than updating with each additional new data point. Here you can see an example of how to use this library.

Then we might take a look at how all of these methods perform in comparison to one another. Below you can see a few different cases of how well each of the models performs.

AAPL Predictions
AAMZN Predictions
IBM Predictions

The prophet model is not performing well (the blue), while the large, single layer model is the best performing (the green). The way that the training and testing data were set up in all but the prophet model were such that the next day’s price was predicted based on the previous 60 days worth of data.

You can find the full script for making predictions and implementing each of these strategies on Github here.

Resources

--

--

Josh Bernhard
The Startup

I communicate in a way that some people like and some don't. I like plaid. The views expressed here are my own.