Stock market prediction using ML

6 min readNov 11, 2021

Data Science consists of a variety of tools, machine learning principles and algorithms which has an intention of getting out hidden patterns from the data set. This concept is being used in the prediction of various future events with the use of algorithms. The future is predicted by using the assistance of past and present data sets. Likewise, this concept is being applied in the prediction of the stock market. The stock market lets the sellers and buyers negotiate prices which leads to trading . Stock is called a share in a particular company’s ownership. This also represents the asset value and earnings of a company. In the field of stock market not just companies are involved, it involves common people also. Either one earns money or loses money. In such a situation, a keen study and prediction of change in the status of stocks will be very efficient in buying and selling suitable stocks. For this some trustworthy Data Science approaches are being used.

Among various methods that are in use for the stock market prediction, prediction by Neural networks to discover nonlinear relationships in input data makes them ideal for modelling nonlinear dynamic systems such as the stock market . Another technique that is used in the prediction of stock prices is ARIMA (Autoregressive Integrated Moving Average). In this model, the future value of a variable is considered to be a linear combination of past values and past errors .

Another model for Stock Market Forecasting is LSTM (Long short-term memory) is a kind of an artificial recurrent neural network (RNN) architecture that is used in deep learning. LSTM contains feedback connections. It not only processes a single data point, but also it takes care of a sequence of data. LSTM models save the data for a long period of time .

1. LSTM MODEL

Among various sequence prediction models Long Short-Term Memory is a kind of Recurrent Neural Network which is a very popular machine learning model. LSTM is able to study data which has long term dependencies. There are four layers in this model which interact with one another causing a recurring module. This model is a complicated part of deep learning and has a capacity to store data for a long period of time as it possesses memory capacity . There is a certain way that has to be gone through in order to complete the LSTM prediction process.

First of all, previous history of stock price should be gathered. This can be done using Yahoo Finance, where the history can be obtained for free. Then the final values of the data sequence will be predicted by building a multi layered LSTM recurrent neural network. The data will be loaded and analysed. Then it will be split to be tested and trained so that generalization ability of the model can be evaluated. Before model fitting, the data will be normalized which will enhance the performance. After reshaping the data, the model will be built. The LSTM model will be built with 4 layers which will be hidden and with 50 neurons. At last, 1 neuron in the output will be assigned for the prediction of normalized stock price.

Figure 1 : Internal Architecture of LSTM

2. ANN Model

When the field of Artificial Intelligence is considered, Bio inspired algorithms have made a huge success. It is a proven fact that applying these algorithms have yielded high domain results. Artificial Neural Network (ANN) is such a model . Based on the central nervous systems of the brain ANN is being modeled. Based on the model of the biological neuron, an artificial neuron is made. Such artificial neurons receive signals from the environment or other such neurons. These signals are initiated based on certain conditions, where the signal is being transmitted to all the other artificial neurons that are connected together.

ANN consists of one or more neurons, thereby it is called the layered network. Those layers are,

Input layer
Hidden layer(s)
Output layer

ANN has the abilities such as learning, generalising, mapping, robust, and processing information in parallel.

Figure 2 : Graphical illustration of Artificial Neuron

A. ANN forecasting guidelines and steps

There are several guidelines and steps proposed by researchers in order to assist others to engage in ANN forecasting with clarity. In such a way, Marcus O’connor and William Remus have proposed the first guide . The guidelines are,

Before estimating the Neural network model, the data should be cleaned.
Before estimating the model, deseasonalized and scale the data.
Utilise the most suitable methods to select the correct beginning point.
In order to ignore local optima, select specialized methods.
Unless there is no significant development, the network should be expanded.
While evaluating Neural Networks choose holdout samples and when estimating choose pruning techniques.
In order to avoid disadvantages that could be caused in NN, choose software that has in-built features.
In order to get model acceptance, build plausible Neural networks by decreasing the size.
Make sure that Neural networks are valid by using more approaches.

There is another guide which was introduced by Chew Lim Tan and JingTao Yao which consists of seven steps for the purpose of conducting ANN forecasting . Those guidelines are,

Pre-processing data.
Choosing the input variables and the output variables.
Analysing Sensitivity.
Organising data.
Constructing model.
Post analysis.
Recommendation of model.

In order to provide proper guidance in the process of developing a complete and successful forecasting model the researchers have proposed such guidelines and steps.

Figure 3 : Architecture of ANN

ANN is vastly used in predicting obstacles in many varieties of applications as it has proven to be successfully solving problems. In forecasting financial time series also this model has been used as it can predict the price at an approximate range of 85% to 90%. Although the ANN model has proven to be very efficient in many ways, there are some disadvantages also. Overtraining is the main issue among them. Due to the fact that Neural Networks have many nodes and too long periods for training the overtraining issue occurs. As ANN does not implicate the significant weight of each variable there exists a Blackbox problem.

3. ARIMA MODEL

When compared with ANNs which are from an artificial intelligence perspective, the Autoregressive Integrated Moving Average (ARIMA) model is from statistical models’ perspectives. In general, predictions can be done in statistical and artificial intelligence techniques. Although the Artificial Neural Network model is considered to be very popular, the ARIMA model is more efficient in forecasting financial time series particularly in short term predictions. This model is vastly used in the finance sector and economics. ARIMA model provides short term predictions to investors which helps them in the decision making process. The ARIMA model was introduced by Box and Jenkins, which is why it is also called the Box-Jenkins methodology. This model is considered to be the most outstanding method in forecasting in the financial sector.

The future value of a variable is said to be the linear combination of past errors and also the past values in ARIMA.

In developing the ARIMA predictive model, the following steps are considered,

Loading the dataset is the first step in building the model.
Based on the dataset, Preprocessing is done in the second step.
By checking the stationarity of the data series and by making necessary transformations the series will be made stationary.
d value will be determined. Here d value is the number of times different operations are carried out.
ACF and PACF plots are created which are used in the determination of the input parameters in the ARIMA model.
By using the plots from the previous step, the p and q values are determined.
The ARIMA model will be fitted by using the parameter values and processed data from the previous values.
Future values will be predicted.

The above steps are considered to be the basic flow of steps that has to be strictly considered while producing a model. The ARIMA model is robust and it is considered as the efficient model in financial time series. Also it has very less standard error of regression. But this model is only suitable for short term predictions, thereby the investors will get a short term forecast to make their decisions in the investments.

Stock market prediction using ML

Written by Luxshika Uthayakumar