Predicting weekly NKY change using SPX and USDJPY via SVM, with 54% cross-validated accurancy.

Published in

Latent

2 min readMay 24, 2018

This is a very direct model, we track the log price change of SPX Index and USDJPY to classify if NKY Index will go up or down in the next period.

This is ran across support vector classification, mlp classifier and random forest classifier.

On the average, we are getting 63% prediction accuracy on the daily time frame and around 60% on the weekly. This is validated using 10 k-folds test over the last 10 years of data.

This is based off the well regarded paper ‘A Hybrid Machine Learning System for Stock Market Forecasting” by Rohit Choudhry and Kumkum Garg.

'''
By Lee Wen Jie
wenjie.lee@rocketshipapac.com
'Forecasting stock market movement direction with supprt vector machine'
http://svms.org/finance/HuangNakamoriWang2005.pdfThe prediction model is as follows:
    nky_direction_t = F(esa_t-1_logged_change, usdjpy_t-1_logged_change)
    
Results:
    There is a proven edge in using spx and usdjpy information to trade nky.
    SVM CVS Score: [0.54597701 0.54597701 0.54651163]
    MLP CVS Score: [0.54597701 0.54597701 0.54651163]
    RFC CVS Score: [0.51724138 0.47126437 0.51744186]
'''import pandas as pd
import numpy as npnky_pickle_path = '../data_bank/NKY_Index_2008-05-26_to_2018-05-24_interval_daily.pkl'
spx_pickle_path = '../data_bank/SPX_Index_2008-05-27_to_2018-05-23_interval_daily.pkl'
usdjpy_pickle_path = '../data_bank/USDJPY_BGN_Curncy_2008-05-26_to_2018-05-24_interval_daily.pkl'data_nky = pd.read_pickle(nky_pickle_path)
data_spx = pd.read_pickle(spx_pickle_path)
data_usdjpy = pd.read_pickle(usdjpy_pickle_path)data_nky_week = data_nky.resample('W').pad()
data_spx_week = data_spx.resample('W').pad()
data_usdjpy_week = data_usdjpy.resample('W').pad()data_nky_week_delta = np.log(data_nky_week['NKY Index']['PX_LAST']).diff()
data_spx_week_delta = np.log(data_spx_week['SPX Index']['PX_LAST']).diff()
data_usdjpy_week_delta = np.log(data_usdjpy_week['USDJPY BGN Curncy']['PX_LAST']).diff()y = np.sign(data_nky_week_delta.fillna(0)).shift(-1)[1:][:-1]
x1 = data_spx_week_delta[1:][:-1]
x2 = data_usdjpy_week_delta[1:][:-1]
x = list(zip(x1, x2))from sklearn import svm
from sklearn.model_selection import cross_val_score
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier
clf_svm = svm.SVC()print('SVM CVS Score: %s'% cross_val_score(clf_svm, x, y))
clf_mlp = MLPClassifier()
print('MLP CVS Score: %s'% cross_val_score(clf_mlp, x, y))
clf_rfc = RandomForestClassifier()
print('RFC CVS Score: %s'% cross_val_score(clf_rfc, x, y))

The paper in reference:

Written by Wen Jie Lee