I’ve always wanted to emulate the lives of serial entrepreneurs like Elon Musk, Warren Buffet and Charlie Munger for 3 primary reasons.
- To drive innovation and disrupt industries
- Immerse myself in an environment of smart people
- Make enough bank to wear a new pair of Yeezys every time I leave home
Yeah- especially that least reason. Unfortunately, money doesn’t grow on trees so when I was 13 years old, I started investing (virtually) into stocks. My portfolio went from a $100, 000 valuation to $437, 303 valuation in 3 years and when I turned 16, I started investing with real money.
I failed. Horribly. RIP 16 years of birthday money 😥
And although I’m improving, I wanted to expedite the process and decided to program an LSTM that can conduct stock predictions to save myself from another atrocious bankruptcy!
Traditional neural networks use feed-forward neurons from which the input is propagated through functions to construct the desirable output. This architecture is limiting in that it cannot capture sequential information of inputted data and artificial neural networks don’t consider previous predictions.
Cognitive computing - a skill-set widely considered to be the most vital manifestation of…
As its users, we have grown to take technology for granted. Hardly anything these days is as commonplace and…
Recurrent neural networks better model our cognitive frameworks since they share the parameters across different time steps which means there are fewer parameters to train and the computational cost decreases. Their internal memory allows the architecture to memorize previous inputs and feed previous outputs back into the input to make better future predictions.
However, recurrent neural networks are rarely implemented today. All recurrent neural networks have feedback loops within the recurrent layer which enables them to maintain information in ‘memory’, however training them is difficult for RNN’s that require learning long-term temporal dependencies. This happens because the gradient of the declared loss function decays exponentially over time and the vanishing gradient problem ensues.
LSTM networks use special units in addition to standard units to include memory cells that maintain information in memory for extended periods of time. This model involves multiple gates that control the time at which information enters the memory, when it’s outputted, and when that information is forgotten, thus enhancing the network’s capacity to learn longer-term dependencies.
In essence, vanilla RNN networks have only hidden states for memory whereas LSTM networks have hidden states and cell states which can remove and add information with gate regulation.
To put this into perspective of stock predictions, the vanilla RNN would forget early-stage stock price data as the years progress whereas the LSTM can use historical trends and data for more accurate predictions. The memory gate collects possible outputs and stores the relevant ones; the selection gate selects final output from the possible outputs that the memory gate produces; the forget and ignore gate decides which data memories aren’t relevant and disposes of them.
Now that we understand recurrent neural networks and the LSTM architecture, let’s program!
Leveraging Long Short Term Memory and Recurrent Neural Networks to predict Apple stock prices …
STEP 1 → Import Libraries
STEP 2 → IMPORT THE DATASET
Pd.read_csv helps read comma-separated values (csv) files into dataframes- the head and tail of our dataset is printed below with the number of rows and columns displayed.
STEP 3 → CHECK NULL VALUES
Next, check for null values in every column and print the total number of null values found. Null values would alter our model’s predictions- luckily none were found.
STEP 4 → VISUALIZE THE DATA
To visualize the stock prices, first drop the unnecessary columns- I plotted each one individually and then collectively just for better visualization, but this step isn’t requisite to creating our LSTM.
STEP 5 → DROP EXTRA COLUMNS
Since we’re only working with the Open stock prices, we can drop the high and low columns. By default, the function will search through the y-axis, but setting axis = 1 ensures the function searches columns instead of rows and inplace=True makes sure the actual dataset is altered. We also dropped the bottom two rows for cleaner numbers later.
STEP 6 → SEPARATE INTO TRAINING AND TESTING
Before proceeding, convert the dataset into a numpy array. Then separate the dataset of 1760 samples of data into training and testing with an 80/20 percentage, making the training dataset 1408 samples and the testing dataset 352 samples.
STEP 7 → SCALE THE DATA
Machine learning workflows are better optimized when each individual feature is scaled to smaller ranges while remaining normally distributed. I used a utility function to scale feature vectors into representations to normalize data within a specific range and make computations faster while reducing error.
I scaled the data to the default range of [0, 1] and used MinMaxScaler which subtracts the minimum value in each individual feature and divides it by the range, which is calculated as the difference between the original minimum and maximum values.
If you’re interested in fully understanding data preprocessing and how scaling with scikit-learn works, check this out!
DATA PRE-PROCESSING WITH SCIKIT LEARN
Most machine learning workflows function better when features are scaled on relatively smaller scales and are normally…
STEP 8 → SEPARATE INTO X_TRAIN AND Y_TRAIN
Separate the data into x_train and y_train and reshape our x_train into an acceptable 3D input for the LSTM model.
STEP 9 → BUILD THE LSTM MODEL
First, instantiate the pre-trained Sequential model that makes life easier by allowing us to simply add layers.
The LSTM layer sets the number of units which declares the dimensionality of the output space. Return_sequences = True determines whether to return the preceding output in the output sequence, or the full sequence and the input_shape indicates the shape of our training dataset and basically reflects the number of time steps while the last parameter is the number of indicators.
The dropout layer randomly selects neurons and disregards them to make our network less sensitive to specific neuron values to have better generalization. This avoids overfitting which outlines the phenomenon that models perform better on training data compared to testing data.
STEP 10 → COMPILE THE MODEL
Compile the model with the Adam optimizer which uses an adaptive learning rate method to update the neural network’s eights iterative based on training data. It combines RMSprop and stochastic gradient descent with momentum- essentially squaring the gradients to scale the learning rate, akin to RMSprop, but it leverages the momentum by using moving average of the gradient instead of the gradient itself like an SGD with momentum would.
Mean squared error of estimators measures the average of the errors’ squares (average squared difference between what is estimated and the estimated values). This metric finds partial error that equates the area of the shape produced from the distance between measured points.
STEP 11 → FIT THE MODEL
Train the model for 100 epochs with a batch size of 32. This means that 32 training samples will be used for 100 iterations.
STEP 12 → PREDICT THE TESTING DATA
Reshape the testing data into an acceptable 3D format and apply the scalar previously used to the dataset. Apply the model predictions onto the data and then inverse the predictions to their original values.
STEP 13 → PLOT THE DATA
Plot the predicted stock prices against the real stock prices. Creating the graph is self-explanatory. The graph should look something similar to this.
STEP 14 → SOME QUICK MATHS
The RMSE value represents the square root of the residuals’ variance and indicates the absolute fit of the model towards the data and the distance between the data points to the predicted values from our model. An RMSE value this low is awesome!
Just for fun, I calculated the minimum and maximum stock price for the actual dataset and predicted values.
Finally, I determined the model accuracy. I conducted this by calculating the MAPE value which is represented with the following formula.
This basically takes the absolute value of (actual — predicted) * 100/ absolute value(actual) * number of samples. The summation of the residual over the actual value was calculated with the for loop and then the MAPE was calculated by rounding the value of that summation divided by the number of samples and converted to a percentage.
Since MAPE represents the error, the accuracy was found by just subtracting the error from 100, to showcase an accuracy of 98.3%! This means our model was extremely accurate. Hopefully this means that every time I make predictions based upon my LSTM, I’ll become 98.3% richer 🤑
ONE LAST THING
Hopefully you were able to better understand LSTM networks and how I leveraged them to make stock predictions! It would mean everything to me if you could support me by doing the following