【Application】Verifying LSTM Stock Price Prediction Effectiveness Using TQuant Lab (Part 1)

TEJ 台灣經濟新報

Published in

TEJ-API Financial Data Analysis

6 min readAug 9, 2024

Key Highlights

Article Difficulty: ★★★★★
Combining fundamental, sentiment, and technical data to perform LSTM stock price prediction and executing backtest performance verification.
Reading Recommendations： This article uses an RNN architecture for time series prediction. A basic understanding of time series or deep learning is necessary. For a deeper understanding of LSTM model construction, refer to the [Data Science] LSTM resource.

Introduction

In machine learning, predicting the prices of financial market derivatives has always been popular. Numerous studies have focused on using machine learning to achieve excess returns in the market. However, since the financial market is essentially a collection of human behaviors encompassing many irregular and uncertain factors, ordinary machine learning models like logistic regression, random forests, and extreme gradient boosting seem unable to capture the overly complex market rules effectively. Consequently, with the vigorous development of deep learning, more time series-related models apply to future stock price predictions. This article uses the LSTM time series model for deep learning-based LSTM stock price prediction, utilizing the opening, high, low, and closing prices of the past five days, quarterly ROE, MOM (indicating the magnitude of price trend changes, and the direction of market trends), and RSI indicators to predict the next day’s closing price.

Editing Environment and Module Requirements

This article uses Mac OS and VS Code as editors.

LSTM Model Construction

Due to the significant impact of large investors and unpredictable market fluctuations on the stock prices of large-cap stocks in Taiwan, making their price movements challenging to predict, this article selects the seventh largest component stock (2618, Eva Airways Corp.) of the Taiwan Small and Medium Cap 300 Index (referred to as “Small and Medium Cap 300 Index”) for Q2 2024, as the target stock. We also selected a higher market cap stock (8215, BenQ Materials Corp.) for backtesting as a reference.

Loading External Packages

import os
import time
import tejapi
import numpy as np
import pandas as pd
...

Loading Internal Packages

ML_stock() is a custom class we created for preprocessing data. It handles loading the API_KEY, price-volume data, fundamental data, and technical indicators. Finally, it sets the start and end dates for the model’s sample period.
*Note: To ensure operation, please enter your API_KEY in the config.ini file before using it.

ml_stock = ML_stock()
ml_stock.ini()
start = '2012-07-01'
end = '2022-07-01'

We have retained only the necessary features for the next steps.

Creating Time Series Data

First, standardize all data and define the training set’s window_size as 5. This means each data point consists of the current day’s data and the following five days’ data. Data is iterated through a sliding window approach, so each data point overlaps the previous data point by five days.

From the above diagram, we can see that we created three-dimensional matrices for the dependent variables (such as open, high, low, and close prices) and the independent variable (the next day’s closing price). The dimensions from left to right represent (the number of data points, number of days, and number of features). After this, we split the data into training, validation, and test sets using an 8:1:1 ratio.

Build the LSTM Stock Price Prediction Model

This study uses one LSTM layer and three Dense layers for LSTM stock price prediction modeling, with Dropout layers interspersed to prevent overfitting. The final layer is a Dense layer with a single neuron outputting the prediction value. We also define an exponentially decaying learning rate, starting at 0.001, with the learning rate being reduced to 90% of its previous value every 10,000 steps and following a stepwise decrease.
We use the Adam Optimizer with the previously defined learning rate settings. The loss function is Mean Squared Error (MSE), and the evaluation metric is Mean Absolute Error (MAE).
Finally, we set up an Early Stopping mechanism that monitors val_loss. If there is no improvement over 10 epochs, the training will stop to prevent overfitting.

model = Sequential([layers.Input((X_train.shape[1], X_train.shape[2])),
                    layers.LSTM(64),
                    layers.Dense(32, activation='relu'),
                    Dropout(0.2),
                    layers.Dense(32, activation='relu'),
                    Dropout(0.2),
                    layers.Dense(1)
                ])
lr_schedule = ExponentialDecay(
    0.001,
    decay_steps=10000,
    decay_rate=0.9,
    staircase=True)
model.compile(optimizer=Adam(learning_rate=lr_schedule), loss='mse', metrics=['mae'])
early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

After about 25 epochs, the training loss gradually stops decreasing significantly, indicating that the model converges quickly.

LSTM 股價預測結果評估

The performance of LSTM stock price prediction on the training and validation sets is quite good, as expected within the sample. However, there is some discrepancy between the predicted and actual prices in the latter part of the out-of-sample test set. While the model captures the overall trend direction, further backtesting is needed to verify its accuracy.

We applied the same method to model 8215 (BenQ Materials) of LSTM stock price prediction and plotted the comparison graph. In the out-of-sample data, the model performs better in predicting the next day’s closing price for 8215. However, the actual effectiveness will be validated in the next article.

LSTM Stock Price Prediction — Feature Importance Analysis

Daily price changes remain the model’s primary reference for all features, followed by MOM and RSI indicators. Interestingly, quarterly ROE is not favored by the model, likely due to its less frequent data updates than other features. For a model that predicts daily stock prices, quarterly ROE is not as relevant in LSTM stock price prediction.

The observation made with 2618 is also evident in 8215: the model does not effectively utilize quarterly ROE for LSTM stock price prediction.

model.save(f'lstm_{sample[0]}.keras', include_optimizer = False)

Finally, we save the model as a .keras file to facilitate the next backtesting phase.

Conclusion

In constructing the LSTM stock price prediction, it appears LSTM could perform well in price forecasting based on the charts. However, past research on time series models often shows a certain degree of delay between the model’s predictions and the actual data. The charts above indicate that the price movements on the first day might only be reflected on the second day. Although the differences are insignificant, this delay could potentially cause issues with timely order execution in backtesting. The next article will provide further analysis of LSTM stock price prediction.

Note: This analysis is for reference only and does not constitute any product or investment advice.

Source Code

Click here to visit GitHub

Extended Reading

【Application】Verifying LSTM Stock Price Prediction Effectiveness Using TQuant Lab (Part 2)