Predict Amazon Inc Stock Price with Machine Learning

8 min readApr 30, 2021

In this article we are going to see how we can predict Amazon stock price with the help of Machine Learning.

Import library

import pandas as pd

Load data

I have used the 5 years historical data of Amazon.com, Inc. (AMZN). You can download data from following link: Amazon.com, Inc. (AMZN)

inputFolder = "input/"filePath = inputFolder + "AMZN.csv"
filePath

Read csv file using pandas library

pandas.read_csv(): Read a comma-separated values (csv) file into DataFrame.

df = pd.read_csv(filePath)
df

Output:

View dataframe shape

pandas.DataFrame.shape: Return a tuple representing the dimensionality of the DataFrame.

df.shape

Output:

Data has 1258 rows and 7 columns.

Print first five records

DataFrame.head(n=5): Return the first n rows.

This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it. Default is 5 number of rows to select.

For negative values of n, this function returns all rows except the last n rows, equivalent to df[:-n].

df.head()

Output:

Print last five records

DataFrame.tail(n=5): Return the last n rows.

This function returns last n rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows. Default is 5 Number of rows to select.

For negative values of n, this function returns all rows except the first n rows, equivalent to df[n:].

df.tail()

Output:

Create a new dataframe

Create a new dataframe with two columns ‘Date’ and ‘Close’. For stock prediction we need only date and closing price. We are using length of original dataframe as index.
class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

Two-dimensional, size-mutable, potentially heterogeneous tabular data.

Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.

Parameters
data: ndarray, Iterable, dict, or DataFrame

Dict can contain Series, arrays, constants, dataclass or list-like objects. If data is a dict, column order follows insertion-order.

index: Index or array-like

Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided.

columns: Index or array-like

Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

dtype: dtype, default None

Data type to force. Only a single dtype is allowed. If None, infer.

copy: bool, default False

Copy data from inputs. Only affects DataFrame / 2d ndarray input.

new_df = pd.DataFrame(index = range(0,len(df)), columns=['Date', 'Close'])
new_df

Output:

Sort the original dataframe

DataFrame.sort_index(axis=0, level=None, ascending=True, inplace=False, kind=’quicksort’, na_position=’last’, sort_remaining=True, ignore_index=False, key=None)

Sort object by labels (along an axis).

Returns a new DataFrame sorted by label if inplace argument is False, otherwise updates the original DataFrame and returns None.

df = df.sort_index(ascending = True, axis = 0)
df

Output:

Fill data in new dataframe

We have to take data from original dataframe(df) and fill in new dataframe(new_df).

for i in range(0, len(df)):
    new_df['Date'][i] = df['Date'][i]
    new_df['Close'][i] = df['Close'][i]new_df

Output:

Set date as index

new_df.index = new_df.Date
new_df

Output:

Drop Date column

Now we don’t need ‘Date’ column, so just drop the column.

DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors=’raise’)

Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names.

Parameters
labels: single label or list-like

Index or column labels to drop.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Whether to drop labels from the index (0 or ‘rows’) or columns (1 or ‘columns’).

index: single label or list-like

Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).

columns: single label or list-like

Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).

inplace: bool, default False

If False, return a copy. Otherwise, do operation inplace and return None.

Returns

DataFrame or None. DataFrame without the removed index or column labels or None if inplace=True.

new_df.drop('Date', axis=1, inplace=True)
new_df

Output:

pandas.DataFrame.values

DataFrame.values: Return a Numpy representation of the DataFrame.

dataset = new_df.values
dataset[:10]

Output:

Scaling features to a range

It is important to scale features before training a neural network. Normalization is a common way of doing this scaling.

A way to normalize the input features/variables is the Min-Max scaler. By doing so, all features will be transformed into the range [0,1] meaning that the minimum and maximum value of a feature/variable is going to be 0 and 1, respectively.

from sklearn.preprocessing import MinMaxScaler

class sklearn.preprocessing.MinMaxScaler(feature_range=0, 1, *, copy=True, clip=False)

Transform features by scaling each feature to a given range.

This estimator scales and translates each feature individually such that it is in the given range on the training set, e.g. between zero and one.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X: array-like of shape (n_samples, n_features)

Input samples.

y: array-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns

X_new: ndarray array of shape (n_samples, n_features_new)

Transformed array.

scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(dataset)
scaled_data[:10]

Output:

Split data in train and test

We are dividing data for training and testing. We have 1258 records. We are taking index 0 to 700 for training and from 700 to last for validation.

train = dataset[0:700,:]
valid = dataset[700:,:]display(train.shape, valid.shape)

Output:

train[:5], valid[:5]

Output:

Converting dataset into X_train and y_train

X_train, y_train = [], []for i in range(60, len(train)):
    X_train.append(scaled_data[i-60: i, 0])
    y_train.append(scaled_data[i,0])
    
    passprint(X_train[0])

Output:

Convert X_train and y_train into numpy array

import numpy as npX_train, y_train = np.array(X_train), np.array(y_train)print(X_train[1])
print(y_train[1])

Output:

X_train.shape[0]

Output: 640

X_train.shape[1]

Output: 60

X_train.shape

Output: (640, 60)

Reshape X_train array

NumPy array reshape this one-dimensional array into a three-dimensional array with 640 sample, 60 time steps, and 1 feature at each time step.

numpy.reshape(a, newshape, order=’C’)

Gives a new shape to an array without changing its data.

X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))X_train.shape

Output: (640, 60, 1)

Now X_train data is ready to be used as input (X) to the LSTM with an input_shape of (60, 1).

Create model

import tensorflow as tfmodel = tf.keras.Sequential()

Add layers in model

The input to every LSTM layer must be three-dimensional.

The three dimensions of this input are:

Samples. One sequence is one sample. A batch is comprised of one or more samples.
Time Steps. One time step is one point of observation in the sample.
Features. One feature is one observation at a time step.

This means that the input layer expects a 3D array of data when fitting the model and when making predictions, even if specific dimensions of the array contain a single value, e.g. one sample or one feature.

Units: The amount of “neurons”, or “cells”, or whatever the layer has inside it.

The LSTM input layer is defined by the input_shape argument on the first hidden layer. The input_shape argument takes a tuple of two values that define the number of time steps and features.

Hidden layer 1: 50 units/ 50 neurons
Hidden layer 2: 50 units/ 50 neurons
Last layer: 1 unit

model.add(tf.keras.layers.LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))model.add(tf.keras.layers.LSTM(units = 50))model.add(tf.keras.layers.Dense(1))

Model summary

model.summary()

Output:

Compile model

The mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors — that is, the average squared difference between the estimated values and the actual value.

model.compile(loss = 'mean_squared_error', optimizer = 'adam')

Train the model

history = model.fit(X_train, y_train, epochs = 100, batch_size=10)

Model history

history.history['loss'][:10]

Output:

Prepare validation data for prediction

print(len(new_df))print(len(valid))

Output:

1258
558

test_inputs = new_df[len(new_df) - len(valid) - 60:].values
test_inputs[:10]

Output:

Reshape and transform test_inputs

test_inputs = test_inputs.reshape(-1,1)
test_inputs  = scaler.transform(test_inputs)
test_inputs[:10]

Output:

Create X_test

X_test = []
for i in range(60, test_inputs.shape[0]):
    X_test.append(test_inputs[i-60:i, 0])

Convert X_test into numpy array

X_test = np.array(X_test)1print(X_test)
print(X_test.shape)

Output:

Reshape X_test

X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
print(X_test.shape)

Output: (558, 60, 1)

Predict X_test data

closing_price = model.predict(X_test)
closing_price[:10]

Output:

Scaler inverse transformation

closing_price = scaler.inverse_transform(closing_price)
closing_price[:10]

Output:

Visualize actual and predicted stock price

import matplotlib.pyplot as plt

Actual and predicted stock price for test data

train = new_df[:700]
valid = new_df[700:]
valid['Predictions'] = closing_priceplt.figure(figsize=(16,8)) 
plt.plot(valid['Close'], color = 'green', label = 'Actual Amazon Inc. Stock Price',ls='--')
plt.plot(valid['Predictions'], color = 'red', label = 'Predicted Amazon Inc. Stock Price',ls='-')
plt.title('Predicted Amazon Inc. Stock Price')
plt.xlabel('Time in days')
plt.ylabel('Stock Price')
plt.legend()

Output:

Visualize training and test data

plt.figure(figsize=(16,8)) 
plt.plot(train['Close'], color = 'blue')
plt.plot(valid[['Close','Predictions']])
plt.title('Amazon Inc. Stock Price')
plt.xlabel('Time in days')
plt.ylabel('Stock Price')

Predict Amazon Inc Stock Price with Machine Learning

Import library

Load data

Read csv file using pandas library

View dataframe shape

Print first five records

Print last five records

Create a new dataframe

Sort the original dataframe

Fill data in new dataframe

Set date as index

Drop Date column

Scaling features to a range

Split data in train and test

Converting dataset into X_train and y_train

Convert X_train and y_train into numpy array

Reshape X_train array

Create model

Add layers in model

Model summary

Compile model

Train the model

Model history

Prepare validation data for prediction

Reshape and transform test_inputs

Create X_test

Convert X_test into numpy array

Reshape X_test

Predict X_test data

Scaler inverse transformation

Visualize actual and predicted stock price

Actual and predicted stock price for test data

Visualize training and test data

Written by Nutan