【Data Analysis】LSTM Trading Signal Judgment

TEJ 台灣經濟新報

Published in

TEJ-API Financial Data Analysis

8 min readDec 19, 2022

Using LSTM model to optimize trading signal and backtesting the result.

Keywords：LSTM Model、Technical Indicators、Trading Strategy

Highlights：

Difficulty：★★★☆☆

Preface

In the previous article, we used the LSTM model to predict the stock price. We used the opening price, highest price, lowest price, closing price, and trading volume of the previous 10 days to predict the closing price of the next day. We found that the model did not perform very well. Tomorrow’s stock was only predicted by yesterday’s stock price. As a result of this, we change our approach and want to use the model to help us judge buying and selling points and carry out trading strategies. This time we have added more feature indicators, hoping to have better results.

Feature indicators’ introduction

We have added eight new feature indicators, four of which are technical indicators and four of which are general economic indicators. We hope to use these two oriented feature values to improve our forecast results.

Technical indicators

KD: A stochastic indicator, which indicates the change of the current price relative to the past period.
RSI: A stock price strength indicator, which indicates the strength of both buyers and sellers.
MACD: A moving average indicator, which indicates Long-term and short-term moving average convergence or divergence
MOM: A trend indicator, which indicates the range of changes in price trends and the direction of market trends.

General Economic Indicators

Taiwan’s business climate countermeasure signal: An important overall economic variable, which represents economic activity and can reflect changes in the business climate.
VIX index: A volatile indicator, which is also a representation of market sentiment and panic.
Leading indicator: A economic indicator, which reflects the prosperity in advance and is used to predict the future trend of the prosperity.
Average price-to-earnings ratio of Taiwan stocks: Taking the average of listed companies’ price-to-earnings ratio, which can be seen whether the overall investor’s view of the entire market is optimistic or pessimistic.

Data loading

This article uses Windows OS as a system and jupyter as an editor.

import tejapi
import pandas as pd

tejapi.ApiConfig.api_key = "Your Key"
tejapi.ApiConfig.ignoretz = True

Database

0050 Adjustment of stock price (day) — ex-dividend adjustment (TWN/APRCD1)
Taiwan stock average price-earnings ratio — overall economy (GLOBAL/ANMAR)
Taiwan’s Prosperity Countermeasures Signal — Overall Economy (GLOBAL/ANMAR)
Leading Indicators — General Economy (GLOBAL/ANMAR)
Chicago VIX Index — International Stock Index (GLOBAL/GIDX)

Download data

Data period: 2011.01.01–2022.11.15.

0050 Ex-dividend adjusted stock price and its opening price, closing price, highest price, lowest price, trading volume

coid = "0050"
mdate = {'gte':'2011-01-01', 'lte':'2022-11-15'}
data = tejapi.get('TWN/APRCD1',
                          coid = coid,
                          mdate = {'gte':'2011-01-01', 'lte':'2022-11-15'},
                          paginate=True)


#開高低收、成交量
data = data[["coid","mdate","open_adj","high_adj","low_adj","close_adj","amount"]]
data = data.rename(columns={"coid":"coid","mdate":"mdate","open_adj":"open",
                   "high_adj":"high","low_adj":"low","close_adj":"close","amount":"vol"})

Technical indicators (KD, RSI, MACD, MOM)

from talib import abstract
data["rsi"] = abstract.RSI(data,timeperiod=14)
data[["macd","macdsig","macdhist"]] = abstract.MACD(data)
data[["kdf","kds"]] = abstract.STOCH(data)
data["mom"] = abstract.MOM(data,timeperiod=15)
data.set_index(data["mdate"],inplace = True)

General economic indicators (Taiwan stock average price-earnings ratio, Taiwan’s business climate countermeasure signal, leading indicators, Chicago VIX index)

data1 = tejapi.get('GLOBAL/ANMAR',
                          mdate = mdate,
                          coid = "SA15",
                          paginate=True)
data1.set_index(data1["mdate"],inplace = True)
data1 = data1.resample('D').ffill()
data = pd.merge(data,data1["val"],how='left', left_index=True, right_index=True)
data.rename({"val":"pe"}, axis=1, inplace=True)
#芝加哥VIX指數
data2 = tejapi.get('GLOBAL/GIDX',
                   coid = "SB82",
                          mdate = mdate,
                          paginate=True)
data2.set_index(data2["mdate"],inplace = True)
data = pd.merge(data,data2["val"],how='left', left_index=True, right_index=True)
data.rename({"val":"vix"}, axis=1, inplace=True)
#景氣對策訊號
data3 = tejapi.get('GLOBAL/ANMAR',
                   coid = "EA1101",
                          mdate = mdate,
                          paginate=True)
data3.set_index(data3["mdate"],inplace = True)
data3 = data3.resample('D').ffill()
data = pd.merge(data,data3["val"],how='left', left_index=True, right_index=True)
data.rename({"val":"light"}, axis=1, inplace=True)
#領先指標
data4 = tejapi.get('GLOBAL/ANMAR',
                   coid = "EB0101",
                          mdate = mdate,
                          paginate=True)
data4.set_index(data4["mdate"],inplace = True)
data4 = data4.resample('D').ffill()
data = pd.merge(data,data4["val"],how='left', left_index=True, right_index=True)
data.rename({"val":"advance"}, axis=1, inplace=True)

Remove empty and useless columns

data.set_index(data["mdate"],inplace=True)
data = data.fillna(method="pad",axis=0)
data = data.dropna(axis=0)
del data["coid"]
del data["mdate"]
data

Trading signals

We choose the moving average combined with the momentum indicator to define the trend, and simply use MA5 > MA20 and RSI5 > RSI 20, it is judged as an upward trend.

data["short_mom"] = data["rsi"].rolling(window=10,min_periods=1,center=False).mean()
data["long_mom"] = data["rsi"].rolling(window=20,min_periods=1,center=False).mean()
data["short_mov"] = data["close"].rolling(window=10,min_periods=1,center=False).mean()
data["long_mov"] = data["close"].rolling(window=20,min_periods=1,center=False).mean()

Label mark
An upward trend is marked as 1, otherwise it is marked as 0.

import numpy as np
data['label'] = np.where(data.short_mov > data.long_mov, 1, 0)
data = data.drop(columns=["short_mov"])
data = data.drop(columns=["long_mov"])
data = data.drop(columns=["short_mom"])
data = data.drop(columns=["long_mom"])

Data distribution
The data is not excessively uneven, However, since the overall trend of the market is upward, it is normal that there are more upward trends.

Data pre-processing

Data standardization

X = data.drop('label', axis = 1)
from sklearn.preprocessing import StandardScaler
X[X.columns] = StandardScaler().fit_transform(X[X.columns])
y = pd.DataFrame({"label":data.label})

Separated learning set and test set with a ratio of 7:3

import numpy as np
split = int(len(data)*0.7)
train_X = X.iloc[:split,:].copy()
test_X = X.iloc[split:].copy()
train_y = y.iloc[:split,:].copy()
test_y = y.iloc[split:].copy()

X_train, y_train, X_test, y_test = np.array(train_X), np.array(train_y), np.array(test_X), np.array(test_y)

Change the data dimension to three-dimensional to meet the requirements of the model

X_train = np.reshape(X_train, (X_train.shape[0],1,16))
y_train = np.reshape(y_train, (y_train.shape[0],1,1))
X_test = np.reshape(X_test, (X_test.shape[0],1,16))
y_test = np.reshape(y_test, (X_test.shape[0],1,1))

LSTM model

Loading packages

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
from keras.layers import BatchNormalization

Add LSTM model
Add four layers of LSTM layers, and use Dropout to prevent model from overfitting

regressor = Sequential()
regressor.add(LSTM(units = 32, return_sequences = True, input_shape = (X_train.shape[1], X_train.shape[2])))
regressor.add(BatchNormalization())
regressor.add(Dropout(0.35))
regressor.add(LSTM(units = 32, return_sequences = True))
regressor.add(Dropout(0.35))
regressor.add(LSTM(units = 32, return_sequences = True))
regressor.add(Dropout(0.35))
regressor.add(LSTM(units = 32))
regressor.add(Dropout(0.35))
regressor.add(Dense(units = 1,activation="sigmoid"))
regressor.compile(optimizer = 'adam', loss="binary_crossentropy",metrics=["accuracy"])
regressor.summary()

Model results (training set)
Set epochs to 100 times.

train_history = regressor.fit(X_train,y_train,
                          batch_size=200,
                          epochs=100,verbose=2,
                          validation_split=0.2)

Model evaluation
From the Model loss graph, the two lines converge during the training process, indicating that the model is not overfitting.

import matplotlib.pyplot as plt
loss = train_history.history["loss"]
var_loss = train_history.history["val_loss"]
plt.plot(loss,label="loss")
plt.plot(var_loss,label="val_loss")
plt.ylabel("loss")
plt.xlabel("epoch")
plt.title("model loss")
plt.legend(["train","valid"],loc = "upper left")

Importance of Variables
Represented how important the different eigenvalues are. It shows that MACD, the average price-earnings ratio of Taiwan stocks and RSI are important characteristic values.

from tqdm.notebook import tqdm
results = []
print(' Computing LSTM feature importance...')
# COMPUTE BASELINE (NO SHUFFLE)
oof_preds = regressor.predict(X_test, verbose=0).squeeze() 
baseline_mae = np.mean(np.abs(oof_preds-y_test))

results.append({'feature':'BASELINE','mae':baseline_mae})           

for k in tqdm(range(len(list(test_X.columns)))):
                
  # SHUFFLE FEATURE K
  save_col = X_test[:,:,k].copy()
  np.random.shuffle(X_test[:,:,k])
                        
  # COMPUTE OOF MAE WITH FEATURE K SHUFFLED
  oof_preds = regressor.predict(X_test, verbose=0).squeeze() 
  mae = np.mean(np.abs( oof_preds-y_test ))
  results.append({'feature':test_X.columns[k],'mae':mae})
  X_test[:,:,k] = save_col

Model results (test set)

The accuracy rate of the test set is as high as 95.49%, showing that the LSTM model can execute our strategy.

regressor.evaluate(X_test, y_test,verbose=1)

Compared the real (Real) Labels with the model prediction (Predict) Labels

Strategy Visualization

The below picture is LSTM strategy forecast trend representation graph, red represents an upward trend, and green represents a downward trend.

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt

df = result.copy()
df = df.resample('D').ffill()

t = mdates.drange(df.index[0], df.index[-1], dt.timedelta(hours = 24))
y = np.array(df.Close[:-1])

fig, ax = plt.subplots()
ax.plot_date(t, y, 'b-', color = 'black')
for i in range(len(df)):
    if df.Predict[i] == 1:
        ax.axvspan(
            mdates.datestr2num(df.index[i].strftime('%Y-%m-%d')) - 0.5,
            mdates.datestr2num(df.index[i].strftime('%Y-%m-%d')) + 0.5,
            facecolor = 'red', edgecolor = 'none', alpha = 0.5
            )
    else:
        ax.axvspan(
            mdates.datestr2num(df.index[i].strftime('%Y-%m-%d')) - 0.5,
            mdates.datestr2num(df.index[i].strftime('%Y-%m-%d')) + 0.5,
            facecolor = 'green', edgecolor = 'none', alpha = 0.5
            )
fig.autofmt_xdate()
fig.set_size_inches(20,10.5)

Strategy backtest

Trading strategy

Buy a position and hold it when the trend signal is upward, sell the original position when the trend signal is downward, and short a position and hold it until the next time the signal transfers an upward trend, then close the position.
Note: This strategy does not consider handling fees, and all funds enter and exit the market.

Result of backtest

LSTM strategy cumulative return is 82.6%
The actual strategy (MA+MOM) cumulative return is 71.3%
The cumulative return on Buy and Hold is 52%

Conclusion

The main objective of this report is that whether LSTM can correctly judge the buying and selling points according to the original strategy we set. The result is positive with a high accuracy rate of 95.49%. The cumulative return of the backtest result is 82.6%, which performs well than the original strategy with 52%. On top of this, LSTM strategy significantly beats the market. The original strategy is easily scrubbed up and down during the consolidation period, resulting in a decline in trading performances. On the contrary, LSTM strategy produces relatively fewer trading signals during the above situation.

Last but not least, please note that “Stocks this article mentions are just for the discussion, please do not consider it to be any recommendations or suggestions for investment or products.” Hence, if you are interested in issues like Creating Trading Strategy , Performance Backtesting , Evidence-based research , welcome to purchase the plans offered in TEJ E Shop and use the well-complete database to find the potential event.

Source Code

Extended Reading

【Quant】- Technical Analysis
【Quant】 Momentum trade