Using machine learning / AI to recommend which stocks to buy

Published in

Young and AI

6 min readMay 25, 2018

(Note: This is a work-in-draft blog post on my work so far on this model. Feel free to comment on sections that you think I can explain in more detail.)

I’m an avid investor, but I always have trouble deciding which stocks to buy. I decided to build a simple Machine Learning model to recommend stocks. To start simply, I wanted to see if ML models can predict which stock will most likely increase by 1% within 5 days.

In general, ML models are good at identifying patterns in data. I am wondering if ML models can identify patterns within stock price movements, much like a trader who does technical analysis.

For this task, I built a Long-Short Term Memory model, which is a type of Recurrent Neural Network. The LSTM model will identify the probability that a certain stock will increase by 1% in 5 days. After making these predictions for multiple stocks, we can choose to invest in the stock with the highest likliehood of increasing.

A big cautionary note: I built this model for myself, so don’t blame me if this model doesn’t work out for your own trading!

If you’re you’d like to first understand the basics of how neural networks work, please refer to my easy intro post on the subject.

The Hypothesis

A Machine Learning model can predict if a stock will rise 1% within 5 days better than random guessing can.

What our neural network looks like

I’m going to jump right in and show you a diagram of the neural network I will use today:

If you haven’t seen this type of diagram before, let me explain how to read it. This diagram is showing the way our inputs (the X variable) will flow through the neural network to arrive at our output (the Y variable). Through the model, the data is transformed a number of different ways, as indicated by the types of transformations (LSTM, ReLU, Dropout, etc.).

What makes neural networks different than a run-of-the-mill “complicated” formula is the weights that are stored within each of the green dots. These are the weights that are updated on each pass of training to slowly match closer to the data — allowing the machine to “learn” from the data that is fed to it.

I’ll explain each of the three sections — The Inputs, The Model, and The Output — in more detail.

The Inputs

We will feed the model a 90-day history of price movements, with 5 different price movements for each day. My belief is that the more information we provide the model, the better it’s predictions will be.

The 5 different price movements (for each day) include:

Open to High price % change (H/O)
Open to Low price % change (L/O)
Open to Close price % change (C/O)
(Previous day) Close to (present day) Open price % change (O/C)
(Previous day) Close to (present day) Close price % change (C/C)

In other words, we’re feeding 450 data points to create a single prediction.

Let me provide a concrete example for how this prediction model will work. Imagine today is 2 Mar 2018. I want to predict if the stock AAPL will increase by 1% within the next 5 days. I’ll feed the model with price movements from last 90 trading days, which happens to be 27 Oct 2017 to 1 Mar 2018.

In practice, preparing the inputs took the most time to build this model. I talk about preparing The Inputs in more detail in another post.

The Model

Here comes the fun part. I’ll walk through what each of the steps do in this model:

First LSTM layer: The data is first fed through a sequential Long-Short Term Memory layer. This layer has our usual neural network weights in them. But they also have additional “gates” that allow information to “remembered” more easily.
In general, recurrent neural networks have a tendency to “forget” older information. These memory gates in the LSTM model help solve this issue by putting in additional weights that carry forward older information.
This will be important in case there are some price movement patterns early in the history that help inform the final prediction.
There’s another great thing about these gates — which is that they also have “learnable” weights! Meaning, we don’t have to manually dictate which memory node should equal 1 (to pass the information forward) or 0 (data is inconsequential to future prediction). The model will automatically update these gates based on the data fed to train it. However, it does mean that there are more weights to learn, which in turn means we require more data to train it.
Rectified Linear Unit (“ReLU”): This is a function that returns a positive number if the input is positive. Otherwise, it returns 0. This function has been proven to speed up the model training process. I’ll paste the source here as soon as I find it.
Dropout (20%): This model will “dropout” 20% of our nodes everytime we feed a data sample through the model. This helps reduce the overfitting of our model by simulating “missing” information that often comes with our noisy data.
Second LSTM layer: Similar to the first LSTM model. However, notice that the final output of this layer is a single output at the end of day 90 to our next ReLu transformation. This is because we only need a single prediction after 90 days, not 90 different predictions after every day.
ReLU: Another ReLU transformation like above.
Dropout (20%): Another dropout transformation like above.
Fully-connected network: This is our typical (non-sequential) fully-connected network. We use this network to reduce our multiple nodes to a single node, because our output only has a single number.
Sigmoid: With the single number, we want to transform it to a range that can output a probability to be close to 0 or 1, our true predictions. This sigmoid function does that for us.

This is considered a “2-layer” LSTM model because of the 2 rounds of LSTM transformations that we put in. A 2-layer model should be able to identify more complicated patterns than a 1-layer model can, but it has a higher tendency to “overfit” the data and may require more data to train. The outputs of our test data might suffer from this.

So, did it work?

Here are the training results from our model.

Our model’s accuracy increases as it’s being trained on the training data. That’s good! Simultaneously, our model loss is decreasing (also good!)

As we fed our model with our training data over over again, the model’s accuracy increased and the “loss” function decreased. Basically this means our predictions are getting better on the training data. Great!

But not so fast…

Our Y_test prediction (the model’s prediction using the test data) is lower than the average # of times the stock will sell.

When we feed our test data through the model, the accuracy is worse than random guessing!

This is most likely due to overfitting. Our model learned the patterns in our training data too well, but it could not generalize to data outside of the training set well.

Can we improve this model?

Definitely, yes. This is a very simplistic model, though it still took a long time to pre-process the data, as always! There’s more complex features we can build into the model that will hopefully inform the predictions better.

My immediate next step is to feed non-sequential data into the model, such as the stock name, so that it can distinguish unique patterns with each stock. Right now, we’ve erased the stock identity from each X variable, so the machine thinks the trading pattern of GE is similar to the trading pattern for FB. This is most likely not the case. We’ll need to incorporate the stock details into our X variables for the machine to better differentiate.

Other possible improvements include:

Adding in other fundamental analysis data, such as Earnings per Share, revenue or net income growth, etc.
Adding in other non-standard signals, such as aggregate analyst ratings
Adding in stock sentiment analysis (ie. from Twitter, etc.)
Anything else?

Hope you enjoyed this article and learned a thing or two about LSTM models, or machine learning models in general. Please feel free to give me feedback about this post — what would you like to see more of? Did some parts not make sense? Any parts that I should delve deeper into? Let me know!

P.S. If you’re interested, here’s the code to the model above.