Beginners guide to stock prediction using LSTM.

Tanvi Pradhan
Analytics Vidhya
Published in
9 min readJul 2, 2021

In recent years, the significance of Artificial Intelligence in finance has become quite apparent due to the availability of vast amounts of clean and structured historical data. As a result, multiple leading banks and financial companies are deploying ML and DL technology, to streamline their processes, build trading strategies, pricing and risk management.

“AI in finance is transforming the way we interact with money”

Here’s a brief look at some of the current and future AI applications in the finance sector:

Algorithmic Trading: Algorithmic trading makes use of complex quant strategies as well as human insights to conduct trades. Machine learning can help push algorithmic trading to new levels by offering even more avenues for gaining special insight into market movements.

Risk Detection and Management: Financial firms are in constant risk of data security due to the high computing power, frequent internet use, and an increasing amount of company data being stored online. ML models are able to learn from patterns of normal behavior. They are quick to adapt to changes in that normal behavior and can quickly identify patterns of fraud transactions. Efficient fraud detection in turn helps to come up with the risk management techniques.

Portfolio Management: It refers to managing an investor’s capital in the form of shares, bonds, cash, mutual funds, etc. so as to earn maximum profits within a given time frame. Introducing AI would help these asset management firms to come up with solutions for improving their investment decisions and making use of their assortment of historical data.

Derivative Pricing: A derivative is a financial contract whose price is derived from one or more underlying assets. Here, the assets could be anything that holds value like bonds, commodities, real estates, etc. While the price of the assets keep on changing based on the market conditions, a financial contract is made where the set price of the asset in the future is speculated.

Sentiment Analysis: The most common use of machine learning for sentiment analysis in financial world is the analysis of financial news. Those are, predicting the sentiments of customers toward market developments and not just limit itself to predict stock prices and trades.

Firms in the finance sector are usually divided into buy side and sell side. The buy side firms are those whose primary business is advising and investing. This includes asset managers such as private equity, mutual funds, life insurance companies, hedge funds, etc. While, the sell side refers to banks and broker dealers that sell investments and services to asset managers and hedge funds. Now, these buy side firms either make some long lasting decisions regarding the strategic allocation of assets. For example, how much to invest in assets like equities, bonds, real estate for a period of time or some temporary decisions regarding which assets to buy or short. These decisions are taken based on the overall movement of the market. So basically, the traders, portfolio managers, developers, researchers in these asset management firms are all working towards the same goal. And that goal is to generate positive returns irrespective of the overall moves in the market.

Delving deeper in the short term decisions of stock market trading, we realize that the market is volatile. And, there are many aspects to be considered while trying to generate returns with the market assets. Traders generally look for market behavior and inefficiencies that will generate high risk adjusted returns on their trading capital. Fitting a model to analyze the overall trend of the market requires implementing some quant methods. These include regression, prediction models, some execution strategies that can analyze and uncover hidden patterns and repetitive behavior in the market for a particular asset. Basically we need a forecasting model that can predict the future value of the stock.

The stock forecasting models can be used when the past numerical data is available and it is reasonable to assume that some of the patterns in the data are expected to continue into the future. It could be a simple regression problem based on single or multiple explanatory variables. Or it can be a time series where we use a function of the actual stock prices for a particular time range to predict the future prices.

I’ll be predicting the future value of the stock of one of the prominent Indian healthcare companies. The time series forecasting would be done using the LSTM model which is a type of recurrent neural network used in deep learning and can learn the order of dependence between items in sequence. They have the ability to learn the context required to make predictions in time series problems rather than having this problem specified, before hand.

The data chosen is from May 2016 to May 2021 as the healthcare companies in India saw a tremendous boom in the past 5 years. Strengthening coverage, services, increasing expenditure by public and private players and the recent pandemic made these stocks show quite a momentum. The program implemented is pretty basic and in no way can it be used to make any significant profits, however it would give you a fair idea regarding how market prediction works.

I’ll be starting off by importing the libraries necessary for this model.

And then extract the data from yahoo finance with the help of pandas datareader. It requires us to put the stock ticker symbol which in this case is ‘APOLLOHOSP.NS’. We also need to mention the source (yahoo finance) and the start and end date of the data we need to base our predictions on.

1st 10 rows of the healthcare company’s stock data

The dataset has different variables, namely: High, Low, Open, Close, Volume and Adj Close. Also, we have the Date set as our index, which is highly significant in any timeseries problem.

· Variables Open and Close show the starting and final price at which the stock is traded on a particular day.

· Variables High, Low represent the maximum and minimum price of the stock for the day.

· Volume is the number of shares bought or sold in the day.

· While the Adj Close is the adjusted closing price of the stock. It alters the stock’s close price after accounting for any corporate actions. It’ll give us a better idea about the overall value of the stock and help make better decisions.

We then proceed with the basic checks. We’ll see whether the dataset has any null values. If present, we try to fill that up using a column average. Furthermore, we need to lookout for the datatypes of the variables. In case of any variation, we would homogenize them to have the same datatypes.

Checking for null values
Variable datatypes and memory usage

Finally, we plot the growth of the stock’s adjusted closing price from May 2016 to May 2021, as that is to be predicted. As evident, we see that the stock price has been oscillating in the range of Rs 1000 to 1500 in the years 2016 to 2019 following a sudden ascend in the year 2020 due to the strong focus of the government on healthcare following the pandemic. The budget estimates for the Department of Health and Family Welfare in year 2020–2021 showed quite a satisfactory increase of 3.75%. Also, there was a considerable 10% hike in allocation for the Department for Health Research. All these factors resulted in growing stock prices of the healthcare companies.

Closing price for the past 5 years.

Here, since we’ll be predicting the stock prices, we’ll need to split our data into two. While dealing with timeseries problems we cannot randomly split the data in train and test set as that will hamper the time component. We need to decide how much data we’ll need to train on. And this decision needs to be made with respect to the date, as the prediction is dependent on the previous data points. Let’s go with 80% of the data that needs to be trained on, since the high spike in the stock prices is seen only from 2020. It would be imperative to train some data from the year 2020 to make successful predictions. The rest of 20% needs to be tested on.

The data for our timeseries problem needs to be scaled when training a recurrent neural network like Long Short-Term Memory as LSTMs are extremely sensitive to the scale of the data. When a network is fit on unscaled data that has a high range of values such as our stock prices, it is possible for large inputs to slow down the learning and convergence of your network and in some cases prevent the network from effectively learning your problem. I’ll be scaling this data in the range of 0 to 1, as specified in the ‘feature_range’.

So, the basic logic behind LSTM is that the data taken from previous day is used to predict the next day data. Now, this time window of 1 day is again used to predict the next day, and so on. This iteration takes place over the whole dataset in batches. The key to this is bigger the time frame, the better. The more number of data points you consider for the prediction, a more accurate prediction shall be seen. Shown below is the code that will create a dataset in which X_train and X_test are the set of independent variables at a particular time (t) and Y_train and Y_test are the target variables at the next time (t+1). As shown, I have taken the timestep=80. Also, X_test and y_test can be created in a similar way with help of test dataset.

Now that we are done with the preprocessing, its time to apply the LSTM model. However, before applying the LSTM model, we need to reshape our data. Why reshape? Because the LSTM network expects the input to be 3 dimensional in the form of number of samples, number of timesteps and the number of features. Also right now, our X_train and y_train dataset is 2D. As depicted below, the number of samples is the number of of rows in X_train, number of timestamps is 80 and the number of features are 1.

The LSTM architecture is pretty easy to understand. To start off with, we would be reading in our sequential data and we are going to assign this to the model. The data is then fed to the neural network and trained for prediction assigning random biases and weights. In the 1st layer, we are putting in X_train which goes into the 50 hidden units and is then transformed into a single output of stock return value. Adding a dropout regularization is for reducing overfitting in the neural network. Finally, we have the output Dense layer, and since we only need output, units has to be 1. I have created a function to build and fit a LSTM network to make it hassle free.

Fitting the LSTM model

If you want to delve deeper into the workings of LSTM, do refer the blog below.

We then use the model to predict the stock prices based on X_test. Using inverse transform brings back the predicted values in original format.

I’ll be using RMSE to calculate the error value. A score of 113.45 is pretty good. Better than I expected for our model!

RMSE score

The LSTM model can be further tuned for various parameters such as increasing the number of epochs, changing the number of LSTM layers, or by adding dropout value. Though the predictions from LSTM are not enough individually to identify whether the stock price will increase or decrease. The market is also largely affected by the news about the company and other sentiments. I am very much interested in exploring timeseries problems in detail and planning to try blending in the useful news data which would relate to our target and help with the prediction.

Though a minimal set of functionalities has been depicted through this walk through, I sincerely hope it gives enough of an insight. Below are some resources that I referred and you may find useful!

--

--