**Predicting Apple Stock Prices with LSTM**

I’ve always wanted to emulate the lives of serial entrepreneurs like **Elon Musk**, **Warren Buffet **and **Charlie Munger **for 3 primary reasons.

- To drive innovation and disrupt industries
- Immerse myself in an environment of smart people
- Make enough bank to wear a new pair of Yeezys every time I leave home

Yeah- *especially *that least reason. Unfortunately, money doesn’t grow on trees so when I was 13 years old, I started investing (virtually) into stocks. My portfolio went from a $100, 000 valuation to $437, 303 valuation in 3 years and when I turned 16, I started investing with real money.

**I failed**. Horribly. RIP 16 years of birthday money 😥

And although I’m improving, I wanted to expedite the process and decided to program an **LSTM **that can conduct **stock predictions **to save myself from another atrocious bankruptcy!

Traditional neural networks use **feed-forward **neurons from which the input is propagated through functions to construct the desirable output. This architecture is limiting in that it cannot capture *sequential information *of inputted data and artificial neural networks don’t consider previous predictions.

**Recurrent neural networks **better model our cognitive frameworks since they share the parameters across different *time steps *which means there are fewer parameters to train and the computational cost decreases. Their internal memory allows the architecture to **memorize previous inputs **and feed previous outputs back into the input to make better future predictions.

However, recurrent neural networks are rarely implemented today. All recurrent neural networks have **feedback loops **within the recurrent layer which enables them to maintain information in *‘memory’*, however training them is difficult for RNN’s that require learning **long-term temporal dependencies**. This happens because the gradient of the declared loss function decays exponentially over time and the **vanishing gradient problem **ensues.

**LSTM **networks use special units in addition to standard units to include *memory cells *that maintain information in memory for extended periods of time. This model involves multiple **gates **that control the time at which information enters the memory, when it’s outputted, and when that information is forgotten, thus enhancing the network’s capacity to learn longer-term dependencies.

In essence, vanilla RNN networks have onlyhidden statesfor memory whereas LSTM networks havehidden states and cell stateswhich can remove and add information with gate regulation.

To put this into perspective of stock predictions, the vanilla RNN would forget early-stage stock price data as the years progress whereas the LSTM can use historical trends and data for more accurate predictions. The **memory gate **collects possible outputs and stores the relevant ones; the **selection gate **selects final output from the possible outputs that the memory gate produces; **the forget and ignore **gate decides which data memories aren’t relevant and disposes of them.

## Now that we understand recurrent neural networks and the LSTM architecture, let’s program!

# STEP 1 → Import Libraries

# STEP 2 → IMPORT THE DATASET

*Pd.read_csv *helps read **comma-separated values (csv)** files into dataframes- the *head *and *tail *of our dataset is printed below with the number of rows and columns displayed.

# STEP 3 → CHECK NULL VALUES

Next, check for **null values **in every column and print the total number of null values found. Null values would alter our model’s predictions- luckily none were found.

# STEP 4 → VISUALIZE THE DATA

To visualize the stock prices, first **drop **the unnecessary columns- I plotted each one individually and then collectively just for better visualization, but this step isn’t requisite to creating our LSTM.

# STEP 5 → DROP EXTRA COLUMNS

Since we’re only working with the **Open **stock prices, we can drop the high and low columns. By default, the function will search through the y-axis, but setting **axis = 1 **ensures the function searches columns instead of rows and **inplace=True** makes sure the actual dataset is altered. We also dropped the bottom two rows for cleaner numbers later.

# STEP 6 → SEPARATE INTO TRAINING AND TESTING

Before proceeding, convert the dataset into a **numpy array**. Then separate the dataset of **1760 **samples of data into training and testing with an *80/20* percentage, making the training dataset **1408 **samples and the testing dataset **352 **samples.

# STEP 7 → SCALE THE DATA

Machine learning workflows are better optimized when each individual feature is **scaled **to smaller ranges while remaining *normally distributed*. I used a utility function to **scale feature vectors **into representations to *normalize *data within a specific range and make computations faster while reducing error.

I scaled the data to the default range of **[0, 1]** and used **MinMaxScaler **which subtracts the minimum value in each individual feature and divides it by the range, which is calculated as the difference between the original minimum and maximum values.

If you’re interested in fully understanding data preprocessing and how scaling with scikit-learn works, check this out!

# STEP 8 → SEPARATE INTO X_TRAIN AND Y_TRAIN

Separate the data into x_train and y_train and reshape our x_train into an acceptable 3D input for the LSTM model.

# STEP 9 → BUILD THE LSTM MODEL

First, instantiate the pre-trained **Sequential **model that makes life easier by allowing us to simply add layers.

The **LSTM layer **sets the number of *units *which declares the dimensionality of the output space. **Return_sequences = True **determines whether to return the preceding output in the output sequence, or the full sequence and the **input_shape **indicates the shape of our training dataset and basically reflects the number of *time steps *while the last parameter is the number of indicators.

The **dropout layer **randomly selects neurons and disregards them to make our network *less sensitive *to specific neuron values to have *better generalization*. This avoids **overfitting **which outlines the phenomenon that models perform better on training data compared to testing data.

# STEP 10 → COMPILE THE MODEL

Compile the model with the **Adam optimizer **which uses an *adaptive learning rate *method to update the neural network’s eights iterative based on training data. It combines **RMSprop **and **stochastic gradient descent **with **momentum**- essentially squaring the gradients to scale the learning rate, akin to RMSprop, but it leverages the momentum by using moving average of the gradient instead of the gradient itself like an SGD with momentum would.

**Mean squared error **of estimators measures the average of the errors’ squares *(average squared difference between what is estimated and the estimated values)*. This metric finds partial error that equates the area of the shape produced from the distance between measured points.

# STEP 11 → FIT THE MODEL

Train the model for **100 epochs **with a **batch size **of **32**. This means that 32 training samples will be used for 100 iterations.

# STEP 12 → PREDICT THE TESTING DATA

Reshape the testing data into an acceptable **3D **format and apply the *scalar *previously used to the dataset. Apply the **model predictions **onto the data and then **inverse **the predictions to their original values.

# STEP 13 → PLOT THE DATA

Plot the predicted stock prices against the real stock prices. Creating the graph is self-explanatory. The graph should look something similar to this.

# STEP 14 → SOME QUICK MATHS

The **RMSE value **represents the *square root of the residuals’ variance *and indicates the absolute fit of the model towards the data and the distance between the data points to the predicted values from our model. An RMSE value this low is awesome!

Just for fun, I calculated the **minimum **and **maximum **stock price for the actual dataset and predicted values.

Finally, I determined the **model accuracy**. I conducted this by calculating the **MAPE **value which is represented with the following formula.

This basically takes the absolute value of *(actual — predicted) * 100/ absolute value(actual) * number of samples. *The summation of the residual over the actual value was calculated with the for loop and then the MAPE was calculated by rounding the value of that summation divided by the number of samples and converted to a percentage.

Since MAPE represents the **error**, the **accuracy **was found by just subtracting the error from 100, to showcase an accuracy of **98.3%**! This means our model was extremely accurate. Hopefully this means that every time I make predictions based upon my LSTM, I’ll become 98.3% richer 🤑

# ONE LAST THING

Hopefully you were able to better understand LSTM networks and how I leveraged them to make stock predictions! It would mean everything to me if you could support me by doing the following

- See that 👏 icon? Send my article some
**claps** **Connect**with me via Twitter, LinkedIn and Github**Check out**my personal website for my latest work