Forecasting the daily direction of S&P 500 using ensemble machine learning methods with 55.78% accuracy using macro and technical variables

Wen Jie Lee
Latent
Published in
6 min readJun 11, 2018

Forecasting the S&P 500 is an attractive pursuit of many financial market participants as it is undoubtedly the largest and most developed equity market in the world. The S&P 500 value is derived from taking the sum of the largest 500 capitalized companies in the US and be divided with an index divisor. It is used as a litmus stick for risk sentiment for many around the world.

We collect daily feature data from 1990 to 2018. Features include stock indices for FTSE 100, Nikkei 225 and Shanghai Composite. We also include macro variables such as GBPUSD, USDCNY, USDJPY, gold and crude which are highly observed variables for traders. Next, we will include momentum indicators of S&P 500 to containerize its auto-correlation properties.

For the independent feature variables, we use the daily percentage change rather than the price index itself so the change in price is more useful than the price level itself. Preprocessing methods will be decided by our algorithm. As for the dependent variable, we use a binary classifier for S&P 500 directionality, 1 for a positive daily change and -1 for a negative daily change.

Data alignment adjustment is crucial as different instruments have different closing times. The 2 Asian markets — Nikkei 225 and Shanghai Composite closes before the opening of the US markets, hence we use the daily data on the same date for us. For the rest of the features, we use the previous day’s value in the forecast.

We start by examining the trendiness of S&P 500 daily returns.

Autocorrelation at different lags

At lags 1 and 2, we see statistically significant mean reversion properties at 99% confidence bands. Hence when S&P 500 goes up today, there is a significant probability that it will come down tomorrow.

Noticeably, Nikkei 225 has a strong correlation as it captures the trading sentiment from the prior Asian session before the US opens.

Next we have scatter plots to observe any patterns or abnormally with our input data.

Scatter plots of pct change of features against S&P 500
How the same set of data would look like after binarization

Through the scatter plots, we can see some correlations in the patterns, some more obvious and some less obvious. We then perform data cleansing, which we exclude rows with missing data points, perhaps due to different trading day arrangements. Outliers greater than 5 standard deviations were removed from our data to prevent skews in the fitted results. There were a few data points of such for USDCNY and SSE.

We use an automated machine learning method, an elegant method to product predictions for a dataset within a fixed computational budget. In machine learning, we need to decide which algorithm should to be used, how data preprocessing should be done and how to set the hyper-parameters of the algorithms. We let the the automated method learn these for us using Bayesian hyperparameter optimization. Training time around about 1 hour.

Ensembles often outperform individual models especially when the models they are based on are individually strong and make uncorrelated errors. We form a weighted ensemble with adjusted weights from the individual models from holdout set. This is through the sklearn and auto-sklearn packages. We use a 0.75–0.25 train test split for our 18 years of daily data set.

Output: Accuracy score of our out-sample testing.

Accuracy score: 0.5577981651376147

Weighted model of the final ensemble found. We can see the weights and type of hyper parameters being used for each estimator.

[(0.36000000000000004,
SimpleClassificationPipeline({'preprocessor:extra_trees_preproc_for_classification:max_depth': 'None', 'classifier:__choice__': 'sgd', 'preprocessor:extra_trees_preproc_for_classification:bootstrap': 'True', 'classifier:sgd:fit_intercept': 'True', 'categorical_encoding:__choice__': 'no_encoding', 'preprocessor:extra_trees_preproc_for_classification:min_samples_leaf': 1, 'classifier:sgd:loss': 'perceptron', 'classifier:sgd:penalty': 'elasticnet', 'preprocessor:extra_trees_preproc_for_classification:max_features': 0.7272215836101141, 'preprocessor:extra_trees_preproc_for_classification:n_estimators': 100, 'preprocessor:extra_trees_preproc_for_classification:min_weight_fraction_leaf': 0.0, 'imputation:strategy': 'most_frequent', 'preprocessor:extra_trees_preproc_for_classification:min_samples_split': 10, 'preprocessor:extra_trees_preproc_for_classification:min_impurity_decrease': 0.0, 'balancing:strategy': 'none', 'classifier:sgd:tol': 1.0509136658813787e-05, 'preprocessor:extra_trees_preproc_for_classification:criterion': 'entropy', 'classifier:sgd:eta0': 0.016340521198734054, 'preprocessor:__choice__': 'extra_trees_preproc_for_classification', 'classifier:sgd:alpha': 0.0011051939453437334, 'rescaling:__choice__': 'minmax', 'classifier:sgd:learning_rate': 'constant', 'preprocessor:extra_trees_preproc_for_classification:max_leaf_nodes': 'None', 'classifier:sgd:average': 'False', 'classifier:sgd:l1_ratio': 0.0001808802162346077},
dataset_properties={
'multilabel': False,
'sparse': False,
'multiclass': False,
'task': 1,
'target_type': 'classification',
'signed': False})),
(0.24000000000000002,
SimpleClassificationPipeline({'rescaling:quantile_transformer:output_distribution': 'normal', 'classifier:sgd:penalty': 'elasticnet', 'classifier:__choice__': 'sgd', 'balancing:strategy': 'none', 'imputation:strategy': 'median', 'rescaling:quantile_transformer:n_quantiles': 31569, 'classifier:sgd:fit_intercept': 'True', 'categorical_encoding:__choice__': 'no_encoding', 'classifier:sgd:eta0': 0.01902420330886627, 'preprocessor:__choice__': 'select_percentile_classification', 'classifier:sgd:tol': 0.00015092867817487565, 'classifier:sgd:alpha': 0.0015530751878415228, 'preprocessor:select_percentile_classification:score_func': 'mutual_info', 'rescaling:__choice__': 'quantile_transformer', 'classifier:sgd:learning_rate': 'constant', 'preprocessor:select_percentile_classification:percentile': 80.09400148727232, 'classifier:sgd:loss': 'perceptron', 'classifier:sgd:average': 'True', 'classifier:sgd:l1_ratio': 0.08599797547972958},
dataset_properties={
'multilabel': False,
'sparse': False,
'multiclass': False,
'task': 1,
'target_type': 'classification',
'signed': False})),
(0.18000000000000002,
SimpleClassificationPipeline({'preprocessor:__choice__': 'kernel_pca', 'rescaling:robust_scaler:q_max': 0.9826214080633513, 'rescaling:robust_scaler:q_min': 0.12185671565664284, 'preprocessor:kernel_pca:n_components': 1890, 'classifier:__choice__': 'gaussian_nb', 'balancing:strategy': 'weighting', 'preprocessor:kernel_pca:kernel': 'cosine', 'rescaling:__choice__': 'robust_scaler', 'imputation:strategy': 'mean', 'categorical_encoding:__choice__': 'one_hot_encoding', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'False'},
dataset_properties={
'multilabel': False,
'sparse': False,
'multiclass': False,
'task': 1,
'target_type': 'classification',
'signed': False})),
(0.14000000000000004,
SimpleClassificationPipeline({'classifier:passive_aggressive:loss': 'squared_hinge', 'classifier:passive_aggressive:C': 0.019936142191500958, 'classifier:passive_aggressive:average': True, 'classifier:__choice__': 'passive_aggressive', 'balancing:strategy': 'weighting', 'classifier:passive_aggressive:tol': 0.09947971183745015, 'preprocessor:fast_ica:fun': 'exp', 'categorical_encoding:__choice__': 'no_encoding', 'preprocessor:__choice__': 'fast_ica', 'preprocessor:fast_ica:whiten': 'True', 'classifier:passive_aggressive:fit_intercept': 'True', 'rescaling:__choice__': 'normalize', 'imputation:strategy': 'most_frequent', 'preprocessor:fast_ica:n_components': 787, 'preprocessor:fast_ica:algorithm': 'deflation'},
dataset_properties={
'multilabel': False,
'sparse': False,
'multiclass': False,
'task': 1,
'target_type': 'classification',
'signed': False})),
(0.04000000000000001,
SimpleClassificationPipeline({'classifier:lda:n_components': 228, 'preprocessor:select_rates:alpha': 0.4086724846658236, 'classifier:lda:shrinkage': 'None', 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.010000000000000004, 'classifier:__choice__': 'lda', 'balancing:strategy': 'none', 'preprocessor:select_rates:score_func': 'f_classif', 'categorical_encoding:__choice__': 'one_hot_encoding', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'preprocessor:__choice__': 'select_rates', 'rescaling:__choice__': 'minmax', 'preprocessor:select_rates:mode': 'fpr', 'imputation:strategy': 'mean', 'classifier:lda:tol': 0.0007094671192004056},
dataset_properties={
'multilabel': False,
'sparse': False,
'multiclass': False,
'task': 1,
'target_type': 'classification',
'signed': False})),
(0.020000000000000004,
SimpleClassificationPipeline({'rescaling:quantile_transformer:output_distribution': 'normal', 'classifier:sgd:penalty': 'l2', 'classifier:__choice__': 'sgd', 'balancing:strategy': 'none', 'imputation:strategy': 'mean', 'rescaling:quantile_transformer:n_quantiles': 57176, 'classifier:sgd:fit_intercept': 'True', 'categorical_encoding:__choice__': 'no_encoding', 'classifier:sgd:eta0': 0.09660393962402068, 'preprocessor:__choice__': 'fast_ica', 'classifier:sgd:tol': 0.002064399699541523, 'classifier:sgd:alpha': 9.173248006514544e-05, 'preprocessor:fast_ica:whiten': 'True', 'preprocessor:fast_ica:fun': 'exp', 'rescaling:__choice__': 'quantile_transformer', 'classifier:sgd:learning_rate': 'optimal', 'classifier:sgd:loss': 'log', 'classifier:sgd:average': 'True', 'preprocessor:fast_ica:n_components': 1215, 'preprocessor:fast_ica:algorithm': 'parallel'},
dataset_properties={
'multilabel': False,
'sparse': False,
'multiclass': False,
'task': 1,
'target_type': 'classification',
'signed': False})),
(0.020000000000000004,
SimpleClassificationPipeline({'preprocessor:__choice__': 'select_percentile_classification', 'classifier:lda:n_components': 78, 'preprocessor:select_percentile_classification:score_func': 'mutual_info', 'classifier:lda:shrinkage': 'auto', 'classifier:__choice__': 'lda', 'balancing:strategy': 'none', 'preprocessor:select_percentile_classification:percentile': 18.82176104412942, 'rescaling:__choice__': 'standardize', 'imputation:strategy': 'most_frequent', 'categorical_encoding:__choice__': 'no_encoding', 'classifier:lda:tol': 8.390369963558585e-05},
dataset_properties={
'multilabel': False,
'sparse': False,
'multiclass': False,
'task': 1,
'target_type': 'classification',
'signed': False}))]

As per the output codes above, our final model is weighted to:

0.36 * sgd + 0.24 * sgd +0.18* gaussian_nb +0.14 * passive_aggresive + 0.04 * lda + 0.02 * sgd + 0.02* lda = 1.00 weight. There is a high weighing to Stochastic gradient descent (sgd) and Linear Discriminant Analysis (lda) methods.

Visually examine the positive and negative errors

In conclusion

We used a multitude of stock indexes, macro prices and technical indicators to capture the relationship between these factors and S&P movements. We also employed an elegant method of automated machine learning which we taught the machine how to fine tune the parameters of each estimators and to build an assembly method to make its final predictions. We are able to attain a predictive accuracy of up to 55.78%. This statistical edge can be interpreted by portfolio managers and traders to bring about a certain level of profits.

Latent Analytics

Latent Analytics is a Singapore based predictive analytics company. We employ most the efficient machine learning algorithms to predict real world trends, both in financial markets and for companies. We offer daily market signal generations based on our algorithms for users — we do the heavy lifting analysis so they do not have to. Go to www.thelatentanalytics.com

Also, happy to speak and discuss similar ideas with interested parties.

--

--

Wen Jie Lee
Latent
Editor for

Founder @ Latent. Smart search engine for opinionated comments. www.latentapp.com