Stock Price Prediction Using Artificial Recurrent Neural Network — Part 1

Kumarpal Nagar
Mindful Engineering
7 min readOct 14, 2021
Image Credit: Intel® AI Developer

What is AI?

Nowadays, Artificial Intelligence and machine learning have become the talking points of the technological and business industries. Many of us are heedless of the fact that we are already making use of Artificial Intelligence.

Artificial Intelligence (AI) is an area of Computer Science shaping many industries by solving intellectual issues linked to Human Intelligence. The rapid growth of AI is the result of vast research done by scientists and engineers. Bigger organizations are investing heavily in AI and everyone who has heard of Artificial Intelligence(AI) believes in it and considers it in the future.

One of the popular parts of AI is Machine Learning. Let’s quickly go through ‘what is Machine learning?’

What is ML?

Machine learning “gives computers the ability to learn without being explicitly programmed” (Arthur Samuel, 1959).

It empowers the machine to learn from past incidents. It alludes to the study of creating machines that can design their programs from past data and do not require distinct programming. There works the algorithm that uses previous data reports.

Machine Learning Types: There are two types of ML:

  1. Unsupervised
  2. Supervised

The difference between Supervised and Unsupervised Learning is the types of data and algorithms.

Unsupervised Learning uses unlabeled data and “self-guided” learning algorithms.

Supervised Learning, on the other hand, uses labeled data and defined training algorithms.

Machine Learning Workflow:

We are going to understand Machine Learning through a project using Supervised Learning. Here mentioned the detailed steps to build a Machine Learning Model or Project.

Before we start with the steps, Let’s quickly install the Pre- Requisites.

Pre-requisites:

Before we start, we need to have the prerequisites ready.

  1. Install all required python libraries.
  2. Install and Configure Jupyter Notebook

=> First of all, we will install all the listed python libraries. You can install it using the pip command.

- Numpy: Multi-dimensional arrays and matrices. (pip install numpy)

- Pandas: Data manipulation and analysis. (pip install pandas)

- Matplotlib: Create static, animated, and interactive visualization. (pip install matplotlib)

- Scikit-Learn: Features various classification, regression, and clustering algorithms. (pip install scikit-learn)

- TensorFlow: Training and inference of deep neural networks. (pip install tensorflow)

=> Install and Configure Jupyter Notebook to write code.

After we have installed the above-mentioned prerequisites, The libraries would be loaded and look like shown in figure-A.

Figure — A

Let’s start with the first step of ML.

Step 1: Gather Data

In this step, We need to gather or collect data from any resource like a website, electronic device, or IoT device, etc.

For the example given below, we have collected stock data from a website (finance.yahoo.com ) in a CSV file.

URL:- Download AAPL.csv

Note: Please check data (in this case AAPL.csv) size using df.info. its size must be more than 3000 entries.
If data(in this case AAPL.csv) is less than 3000 then you have to change time_step size less than 100 like (30,40,or 50) in program as show in figure K.

Now load the data from the CSV file using the read_csv() method of Pandas.

Figure — B

As shown in figure-B, the Input In [3] — read_csv() method loads the AAPL.csv file and automatically converts it into Pandas Dataframe. You can see the dataframe (df) using the head() method of Pandas which will show you the top five lines of the dataframe.

You can see dataframe information using the info() method of Pandas.

df.info() — This will give you information about the total numbers of records, columns, datatype, etc.

Figure — C

Step 2: Prepare Data (Clean, Prepare, and Manipulate Data)

In the second step of ML, we will clean data, and select important features from the dataset in order to get perfect prediction results. In this case, we don’t need a dataframe with all columns (features) for training or prediction so we are selecting only one column (“Low”) and set it as a Pandas Series using the given command:

df1 = df.reset_index()[‘Low’] — reset index value and store only the “Low” column as a series into df1 from dataset df.

Figure — D

We need to check for any null value inside the df1 data. Below we have checked for the null value.

Figure — E

As shown in figure-E there are two null values at index 49 and 4472.

This dataset’s records have null values that are very small than the total number of records. Therefore, we drop or remove these records using the Pandas dropna() method. This process is called Data Cleaning.

To fill in missing values, we should compute the median value on the dataset and use it to fill the missing values in the dataset. This process is called Data Manipulate.

Figure — F

After doing the above-mentioned operations on data, there remain no more null values(Shown in figure-F).

Hence, Now we can split this data into train and test datasets. Before we do that, let’s see the graphical visualization of dataset df1.

Figure — G

Here we are using the python Matplotlib library to draw the graphical visualization of dataset df1.

Input In [12]: the first-line set size of the graph and second-line draw graph of df1 data.

Now we can split the dataset into a training dataset and a test dataset.

Figure — H

As shown in figure — H, we are giving 80% of the data to the training dataset and the remaining 20% data to the test dataset. In this case, the training size is 6328, and the test size is 1583.

You can change the percentage of datasets by changing a parameter in Input In[15] first-line 0.80 to anything you want.

Normalization:- In this process values are shifted and rescaled so that they end up ranging from 0 to 1. but wait why do we need to do that because the difference in the scale of the numbers could cause problems when you attempt to combine the values as features during modeling. Normalization avoids these problems by creating new values.

Scikit-Learn provides a transformer called MinMaxScaler for this.

Before normalization values of data are shown in figure-D Input In [7].

Let’s normalize data but before that, we need to create a 2-dimensional array of datasets and sent it to the fit_transform() method.

Figure — I

After applied normalization data will look like this:

Figure — J

According to supervised learning, we need data as a form of input data (feature — x) with the label (y).

Figure — K

But in this case, the datasets contain features without labels. We would require some logic and transform the datasets into feature and label forms as shown in figure-K.

For the training dataset — X_train is the feature and y_train is the label.

For the test dataset — X_test is the feature and y_test is the label.

Now we reshape the X_train and X_test datasets for the model which we will discuss in the next step.

Figure — L

Well done! You’ve survived Part 1- Stock Price Prediction Using Artificial Recurrent Neural Network in which we understand what is AI, ML, Unsupervised Learning, Supervised Learning, and some steps(Gather Data, Prepare Data) of Machine Learning Workflow

Now you’re ready to check Part 2 - Stock Price Prediction using Artificial Recurrent Neural Network. In that, we will understand the steps Choose The Model, Train the Model, Evaluation, Hyperparameter Tuning, and Prediction of Machine Learning Workflow.

--

--

Kumarpal Nagar
Mindful Engineering

AI Developer | Python Developer | Certified Ethical Hacker (CyberVrag) | Instructor