股票LSTM分析(2)-Classifiacion

Frederick Lee
5 min readApr 7, 2019

--

本文使用LSTM(Long short-term memory),來做股票的classifiacion預測,跟先前的regression分析是不一樣,直接給予label漲跌餵進LSTM的模型裡。

想解決的問題以及為何使用classifiction?

因為在做regression時,沒辦法適時的馬上得知當下漲跌,不知當下漲跌就很難知道該買進賣出,只能大概知道出來的股價在哪,上文也提出用regression預測出來的值做出漲跌,但是準確率卻僅僅只有快0.4不到而已,比用猜的還不準,所以這次想用序列的feature來分類漲跌的label。

給予label列,比前一天的值高給予label 1,反之為0:

#建立label
def load_data(price_data):
data = []
for i in range(1258):
if price[i+1] > price[i] :
data.append(1)
else:
data.append(0)
return data
up_down = load_data(price.values)

將資料與label合併:

norm_Aprice_data0 = pd.concat([norm_Aprice_data,up_down0],axis=1)

做LSTM序列data以及序列data:

def train_windows(df, ref_day=5, predict_day=1):
X_train, Y_train = [], []
for i in range(df.shape[0]-predict_day-ref_day):
X_train.append(np.array(df.iloc[i:i+ref_day,:-1]))
Y_train.append(np.array(df.iloc[i+ref_day:i+ref_day+predict_day][8]))
return np.array(X_train), np.array(Y_train)
X,Y=train_windows(norm_Aprice_data0,5,1)
split_boundary = int(X.shape[0] * 0.9)
train_x = X[: split_boundary]
test_x = X[split_boundary:]
train_y = Y[: split_boundary]
test_y = Y[split_boundary:]

訓練LSTM模型,在最後一層做sigmoid以及二維分類:

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
regressor = Sequential()
regressor.add(LSTM(units = 256, return_sequences = True, input_shape = (train_x.shape[1], train_x.shape[2])))
regressor.add(Dropout(0.2))
regressor.add(LSTM(units = 256, return_sequences = True))
regressor.add(Dropout(0.2))
regressor.add(LSTM(units = 256, return_sequences = True))
regressor.add(Dropout(0.2))
regressor.add(LSTM(units = 256))
regressor.add(Dropout(0.2))
regressor.add(Dense(units = 1, activation='sigmoid'))regressor.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])regressor.fit(train_x, train_y, epochs = 100, batch_size = 32)

做test data預測:

predict_y = regressor.predict(test_x)
predict_y= pd.DataFrame(predict_y).values.astype(float)

因為預測出來的是個機率,所以得在進行運算分類,大於0.5為label 1,其餘為label 0:

def updown(predict):
data = []
for i in range(126):
if predict_y[i]>0.5:
data.append(1)
else:
data.append(0)
return data

做預測比對test label:

count = 0
for i in range(126):
if predict_yy[i] == test_y[i]:
count = count+1
else:
count = count
accuracy = count / 126

count = 77

accuracy = 0.6111111111111112

Conclussion:

直接分類classifiction準確率高達0.61,比先regrssion預測漲跌還高

如果可以從regrssion得知的預測值架構配合,classification得知的漲跌預測一起使用也許能知道某些事情也說不定。

--

--