Stock Price Prediction using LSTM: A Deep Learning Approach with Historical Data To Minimize Loss

SR
7 min readJan 2, 2024

Certainly! Let’s break down the provided code part by part:

### Importing Libraries:

Use Case: This section is responsible for importing the required libraries to facilitate various tasks throughout the code.

  • numpy: Used for numerical operations and array manipulations.
  • pandas: Essential for handling and manipulating tabular data, making it easy to work with datasets.
  • yfinance: Enables the downloading of historical stock data from Yahoo Finance.
  • MinMaxScaler from sklearn.preprocessing: Scales and normalizes data between 0 and 1, a common preprocessing step for machine learning models.
  • mean_squared_error from sklearn.metrics: Provides a metric to evaluate the performance of the model.
  • Sequential, LSTM, and Dense from tensorflow.keras.models: Used for building and defining the architecture of the LSTM model.
  • matplotlib.pyplot: Facilitates the creation of plots for data visualization.
import numpy as np
import pandas as pd
import yfinance as yf
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
import matplotlib.pyplot as plt

In this section, necessary libraries are imported:

  • numpy: For numerical operations.
  • pandas: For data manipulation and analysis.
  • yfinance: For downloading historical stock data.
  • MinMaxScaler from sklearn.preprocessing: For normalizing data between 0 and 1.
  • mean_squared_error from sklearn.metrics: For evaluating the model.
  • Sequential, LSTM, and Dense from tensorflow.keras.models: For building the LSTM model.
  • matplotlib.pyplot: For plotting.

Here, the required libraries are imported. NumPy is used for numerical operations, Pandas for data manipulation, yfinance for downloading historical stock data, MinMaxScaler for data normalization, mean_squared_error for evaluation, and TensorFlow for building the LSTM (Long Short-Term Memory) model. Matplotlib is used for plotting.

### Downloading Historical Stock Data:

Use Case: To obtain historical stock data for a specific symbol within a specified time range.

  • symbol = 'AAPL': The stock symbol for which data is downloaded, in this case, Apple Inc.
  • data = yf.download(symbol, start='2010-01-01', end='2022-12-31'): Downloads historical stock data for Apple Inc. from January 1, 2010, to December 31, 2022.
symbol = 'AAPL'
data = yf.download(symbol, start='2010–01–01', end='2022–12–31')

In this part, historical stock data for the symbol ‘AAPL’ is downloaded using yfinance from January 1, 2010, to December 31, 2022.

This section uses the yfinance library to download historical stock data for the specified symbol (‘AAPL’) from January 1, 2010, to December 31, 2022.

### Using Closing Prices and Normalizing Data:

Use Case: Extracting the closing prices from the downloaded data and normalizing the data.

  • dataset = data['Close'].values.reshape(-1, 1): Extracts the closing prices and reshapes them for further processing.
  • scaler = MinMaxScaler(feature_range=(0, 1)): Initializes the MinMaxScaler to normalize the closing prices.
  • dataset_scaled = scaler.fit_transform(dataset): Applies normalization to bring the closing prices between 0 and 1.
dataset = data['Close'].values.reshape(-1, 1)
scaler = MinMaxScaler(feature_range=(0, 1))
dataset_scaled = scaler.fit_transform(dataset)

The closing prices are extracted from the downloaded data and reshaped. Then, MinMaxScaler is used to normalize the data between 0 and 1.

The code extracts the closing prices from the downloaded data, reshapes it, and then uses MinMaxScaler to normalize the data between 0 and 1.

### Creating LSTM Input Data:

Use Case: Preparing the data in a format suitable for training the LSTM model.

  • def create_dataset(dataset, time_steps=1): Defines a function to create input data for the LSTM model.
  • Inside the function: It iterates over the dataset to create input-output pairs with the specified time steps.
def create_dataset(dataset, time_steps=1):
# Function to create input data for LSTM
X, y = [], []
for i in range(len(dataset) - time_steps):
a = dataset[i:(i + time_steps), 0]
X.append(a)
y.append(dataset[i + time_steps, 0])
return np.array(X), np.array(y)

A function create_dataset is defined to create input data for the LSTM model. It takes a time series dataset and generates input-output pairs based on the specified time steps.

This function is defined to create input data for the LSTM model. It takes a time series dataset and creates input-output pairs based on the specified time_steps.

### Setting Time Steps and Reshaping Data:

Use Case: Configuring the number of time steps and reshaping the input data for compatibility with the LSTM model.

  • time_steps = 10: Defines the number of previous time steps to consider for predicting the next value.
  • X, y = create_dataset(dataset_scaled, time_steps): Calls the create_dataset function to generate input-output pairs.
  • X = np.reshape(X, (X.shape[0], X.shape[1], 1)): Reshapes the input data to fit the LSTM model's expected input shape.
time_steps = 10
X, y = create_dataset(dataset_scaled, time_steps)
X = np.reshape(X, (X.shape[0], X.shape[1], 1))

The variable time_steps is set to 10, and the create_dataset function is used to generate input-output pairs for the LSTM. The input data is then reshaped to fit the expected format for LSTM input.

The time_steps variable is set, and the create_dataset function is used to generate input-output pairs for the LSTM. The input data is reshaped to fit the expected format for LSTM input.

### Building the LSTM Model:

Use Case: Defining the architecture of the LSTM model and configuring its training parameters.

  • model = Sequential(): Initializes a sequential model using Keras.
  • model.add(LSTM(units=50, return_sequences=True, input_shape=(X.shape[1], 1))): Adds the first LSTM layer with 50 units.
  • model.add(LSTM(units=50)): Adds the second LSTM layer with 50 units.
  • model.add(Dense(units=1)): Adds a Dense layer with 1 unit for output.
  • model.compile(optimizer='adam', loss='mean_squared_error'): Configures the model for training with the Adam optimizer and mean squared error loss.
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(X.shape[1], 1)))
model.add(LSTM(units=50))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mean_squared_error')

A sequential model is created with two LSTM layers and a Dense layer. The model is compiled using the Adam optimizer and mean squared error loss.

A sequential model is created with two LSTM layers and a Dense layer. The model is compiled using the Adam optimizer and mean squared error loss.

### Training the Model:

Use Case: Training the LSTM model with the prepared input data.

  • model.fit(X, y, epochs=50, batch_size=32): Fits the model to the training data for 50 epochs with a batch size of 32.
model.fit(X, y, epochs=50, batch_size=32)

The model is trained using the input data (X) and corresponding target values (y) for 50 epochs with a batch size of 32.

The model is trained using the input data (X) and corresponding target values (y) for 50 epochs with a batch size of 32.

### Predicting Stock Prices on the Testing Set:

Use Case: Using the trained LSTM model to make predictions on a separate set of test data.

  • test_data = yf.download(symbol, start='2022-01-01', end='2022-12-31')['Close'].values.reshape(-1, 1): Downloads test data for the year 2022.
  • scaled_test_data = scaler.transform(test_data): Applies the same scaling to the test data.
  • X_test, y_test = create_dataset(scaled_test_data, time_steps): Creates input-output pairs for testing.
  • X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1)): Reshapes the input data for the LSTM model.
  • predicted_stock_prices = model.predict(X_test): Uses the trained model to predict stock prices on the test set.
  • predicted_stock_prices = scaler.inverse_transform(predicted_stock_prices): Inverse transforms the predictions to the original scale.
test_data = yf.download(symbol, start='2022–01–01', end='2022–12–31')['Close'].values.reshape(-1, 1)
scaled_test_data = scaler.transform(test_data)
X_test, y_test = create_dataset(scaled_test_data, time_steps)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
predicted_stock_prices = model.predict(X_test)
predicted_stock_prices = scaler.inverse_transform(predicted_stock_prices)

Test data is downloaded, normalized, and input-output pairs are created for testing. The LSTM model is used to predict stock prices, and the predictions are inverse-transformed to get the original scale.

Test data is downloaded, normalized, and input-output pairs are created for testing. The LSTM model is used to predict stock prices, and the predictions are inverse-transformed to get the original scale.

### Plotting Actual vs. Predicted Stock Prices:

Use Case: Visualizing the actual and predicted stock prices for evaluation.

  • plt.plot(test_data, label='Actual Stock Prices', color='blue'): Plots the actual stock prices in blue.
  • plt.plot(predicted_stock_prices, label='Predicted Stock Prices', color='red'): Plots the predicted stock prices in red.
  • plt.title('Stock Price Prediction using LSTM'): Adds a title to the plot.
  • plt.xlabel('Time'): Adds a label to the x-axis.
  • plt.ylabel('Stock Price'): Adds a label to the y-axis.
  • plt.legend(): Displays a legend to distinguish between actual and predicted prices.
  • plt.show(): Shows the plot.
plt.figure(figsize=(12, 6))
plt.plot(test_data, label='Actual Stock Prices', color='blue')
plt.plot(predicted_stock_prices, label='Predicted Stock Prices', color='red')
plt.title('Stock Price Prediction using LSTM')
plt.xlabel('Time')
plt.ylabel('Stock Price')
plt.legend()
plt.show()

Finally, the actual and predicted stock prices are plotted using Matplotlib for visualization. Blue represents actual prices, and red represents predicted prices. The title and labels are added for clarity, and the plot is displayed.

Finally, the actual and predicted stock prices are plotted using Matplotlib for visualization. Blue represents actual prices, and red represents predicted prices. The title and labels are added for clarity, and the plot is displayed.

--

--

SR

15-year-old enthusiast passionate about tech, code, and creative writing. Sharing my journey on Medium. 🚀 | Code | Tech | Writing | Malaysian |