Predicting Apple Stock Prices with LSTM

Suraj Bansal
May 15 · 8 min read
Image for post
Image for post

I’ve always wanted to emulate the lives of serial entrepreneurs like Elon Musk, Warren Buffet and Charlie Munger for 3 primary reasons.

  1. To drive innovation and disrupt industries
  2. Immerse myself in an environment of smart people
  3. Make enough bank to wear a new pair of Yeezys every time I leave home

Yeah- especially that least reason. Unfortunately, money doesn’t grow on trees so when I was 13 years old, I started investing (virtually) into stocks. My portfolio went from a $100, 000 valuation to $437, 303 valuation in 3 years and when I turned 16, I started investing with real money.

I failed. Horribly. RIP 16 years of birthday money 😥

And although I’m improving, I wanted to expedite the process and decided to program an LSTM that can conduct stock predictions to save myself from another atrocious bankruptcy!

Traditional neural networks use feed-forward neurons from which the input is propagated through functions to construct the desirable output. This architecture is limiting in that it cannot capture sequential information of inputted data and artificial neural networks don’t consider previous predictions.

Recurrent neural networks better model our cognitive frameworks since they share the parameters across different time steps which means there are fewer parameters to train and the computational cost decreases. Their internal memory allows the architecture to memorize previous inputs and feed previous outputs back into the input to make better future predictions.

However, recurrent neural networks are rarely implemented today. All recurrent neural networks have feedback loops within the recurrent layer which enables them to maintain information in ‘memory’, however training them is difficult for RNN’s that require learning long-term temporal dependencies. This happens because the gradient of the declared loss function decays exponentially over time and the vanishing gradient problem ensues.

Image for post
Image for post

LSTM networks use special units in addition to standard units to include memory cells that maintain information in memory for extended periods of time. This model involves multiple gates that control the time at which information enters the memory, when it’s outputted, and when that information is forgotten, thus enhancing the network’s capacity to learn longer-term dependencies.

In essence, vanilla RNN networks have only hidden states for memory whereas LSTM networks have hidden states and cell states which can remove and add information with gate regulation.

To put this into perspective of stock predictions, the vanilla RNN would forget early-stage stock price data as the years progress whereas the LSTM can use historical trends and data for more accurate predictions. The memory gate collects possible outputs and stores the relevant ones; the selection gate selects final output from the possible outputs that the memory gate produces; the forget and ignore gate decides which data memories aren’t relevant and disposes of them.

Image for post
Image for post

Now that we understand recurrent neural networks and the LSTM architecture, let’s program!

STEP 1 → Import Libraries

Image for post
Image for post

STEP 2 → IMPORT THE DATASET

Image for post
Image for post

Pd.read_csv helps read comma-separated values (csv) files into dataframes- the head and tail of our dataset is printed below with the number of rows and columns displayed.

STEP 3 → CHECK NULL VALUES

Image for post
Image for post

Next, check for null values in every column and print the total number of null values found. Null values would alter our model’s predictions- luckily none were found.

STEP 4 → VISUALIZE THE DATA

Image for post
Image for post

To visualize the stock prices, first drop the unnecessary columns- I plotted each one individually and then collectively just for better visualization, but this step isn’t requisite to creating our LSTM.

STEP 5 → DROP EXTRA COLUMNS

Image for post
Image for post

Since we’re only working with the Open stock prices, we can drop the high and low columns. By default, the function will search through the y-axis, but setting axis = 1 ensures the function searches columns instead of rows and inplace=True makes sure the actual dataset is altered. We also dropped the bottom two rows for cleaner numbers later.

STEP 6 → SEPARATE INTO TRAINING AND TESTING

Image for post
Image for post

Before proceeding, convert the dataset into a numpy array. Then separate the dataset of 1760 samples of data into training and testing with an 80/20 percentage, making the training dataset 1408 samples and the testing dataset 352 samples.

STEP 7 → SCALE THE DATA

Image for post
Image for post

Machine learning workflows are better optimized when each individual feature is scaled to smaller ranges while remaining normally distributed. I used a utility function to scale feature vectors into representations to normalize data within a specific range and make computations faster while reducing error.

I scaled the data to the default range of [0, 1] and used MinMaxScaler which subtracts the minimum value in each individual feature and divides it by the range, which is calculated as the difference between the original minimum and maximum values.

If you’re interested in fully understanding data preprocessing and how scaling with scikit-learn works, check this out!

STEP 8 → SEPARATE INTO X_TRAIN AND Y_TRAIN

Image for post
Image for post

Separate the data into x_train and y_train and reshape our x_train into an acceptable 3D input for the LSTM model.

STEP 9 → BUILD THE LSTM MODEL

Image for post
Image for post

First, instantiate the pre-trained Sequential model that makes life easier by allowing us to simply add layers.

The LSTM layer sets the number of units which declares the dimensionality of the output space. Return_sequences = True determines whether to return the preceding output in the output sequence, or the full sequence and the input_shape indicates the shape of our training dataset and basically reflects the number of time steps while the last parameter is the number of indicators.

The dropout layer randomly selects neurons and disregards them to make our network less sensitive to specific neuron values to have better generalization. This avoids overfitting which outlines the phenomenon that models perform better on training data compared to testing data.

STEP 10 → COMPILE THE MODEL

Image for post
Image for post

Compile the model with the Adam optimizer which uses an adaptive learning rate method to update the neural network’s eights iterative based on training data. It combines RMSprop and stochastic gradient descent with momentum- essentially squaring the gradients to scale the learning rate, akin to RMSprop, but it leverages the momentum by using moving average of the gradient instead of the gradient itself like an SGD with momentum would.

Mean squared error of estimators measures the average of the errors’ squares (average squared difference between what is estimated and the estimated values). This metric finds partial error that equates the area of the shape produced from the distance between measured points.

STEP 11 → FIT THE MODEL

Image for post
Image for post

Train the model for 100 epochs with a batch size of 32. This means that 32 training samples will be used for 100 iterations.

STEP 12 → PREDICT THE TESTING DATA

Image for post
Image for post

Reshape the testing data into an acceptable 3D format and apply the scalar previously used to the dataset. Apply the model predictions onto the data and then inverse the predictions to their original values.

STEP 13 → PLOT THE DATA

Image for post
Image for post

Plot the predicted stock prices against the real stock prices. Creating the graph is self-explanatory. The graph should look something similar to this.

Image for post
Image for post

STEP 14 → SOME QUICK MATHS

Image for post
Image for post

The RMSE value represents the square root of the residuals’ variance and indicates the absolute fit of the model towards the data and the distance between the data points to the predicted values from our model. An RMSE value this low is awesome!

Image for post
Image for post

Just for fun, I calculated the minimum and maximum stock price for the actual dataset and predicted values.

Image for post
Image for post

Finally, I determined the model accuracy. I conducted this by calculating the MAPE value which is represented with the following formula.

Image for post
Image for post

This basically takes the absolute value of (actual — predicted) * 100/ absolute value(actual) * number of samples. The summation of the residual over the actual value was calculated with the for loop and then the MAPE was calculated by rounding the value of that summation divided by the number of samples and converted to a percentage.

Since MAPE represents the error, the accuracy was found by just subtracting the error from 100, to showcase an accuracy of 98.3%! This means our model was extremely accurate. Hopefully this means that every time I make predictions based upon my LSTM, I’ll become 98.3% richer 🤑

ONE LAST THING

Hopefully you were able to better understand LSTM networks and how I leveraged them to make stock predictions! It would mean everything to me if you could support me by doing the following

  1. See that 👏 icon? Send my article some claps
  2. Connect with me via Twitter, LinkedIn and Github
  3. Check out my personal website for my latest work

Data Driven Investor

from confusion to clarity not insanity

Sign up for DDIntel

By Data Driven Investor

In each issue we share the best stories from the Data-Driven Investor's expert community. Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Suraj Bansal

Written by

16 y/o software developer and student researcher interested in computational biochemistry and autonomous vehicles.

Data Driven Investor

from confusion to clarity not insanity

Suraj Bansal

Written by

16 y/o software developer and student researcher interested in computational biochemistry and autonomous vehicles.

Data Driven Investor

from confusion to clarity not insanity

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store