Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500 — part1: data preparation and model
contribution of this paper:
- unique in deploying three state-of-the-art machine learning techniques and their simple ensemble on a large and liquid stock universe
- It reveals that ensemble returns only partially load on systematic sources of risk, are robust in the light of transaction costs, and deteriorate over time — presumably driven by the increasing popularization of machine learning and the advancements in computing power. However, strong positive returns can still be observed in recent years at times of high market turmoil
- focus on a daily investment horizon instead of monthly frequencies, allowing for much more training data and for profitably exploiting short-term dependencies
data
S&P500
- Obtain all month end constituent lists for the S&P 500 from Thomson Reuters Datastream from December 1989 to September 2015. We consolidate these lists into one binary matrix, indicating whether the stock is a constituent of the index in the subsequent month or not.
- For all stocks having ever been a constituent of the index, we download the daily total return indices from January 1990 until October 2015. Return indices reflect cum-dividend prices and account for all further corporate actions and stock splits, making it the most suitable metric for return calculations.
training dataset is generated with 750 days as training set and 250 days as development set with a sliding window of 250 days.
Method
input & output:
- input : input feature is defined contains different gratuity.

- output:

cross section of stock return is explained here
DNN ( deep neural networks )
simple mlp is used(31–31–10–5–2 )
with maxout activation, dropout technique(hidden layer dropout rate 0.5, input dropout rate 0.1) , lambda_L1=0.00001 . Optimizer is ADADELTA.
GBT ( gradient-boosted trees )
AdaBoost, deploying shallow decision trees as weak learners.
The number of trees or boosting iterations MGBT=100 , the depth of the tree JGBT=3 , the learning rate λGBT=.1 , and the subset of features to use at each split, i.e., mGBT=15 .
RAF ( random forests )
For each of the BRAF(1000 trees) trees in the random forest, we first draw a random subset from the original training data. Then, we grow a modified decision tree to this sample, whereby we select mRAF=floor(square(p)) features at random from the p features upon every split. We grow the tree to the maximum depth of JRAF=20 . The final output is an ensemble of BRAF random forest trees, so that classification can be performed via majority vote. All the result of the value use H20 default in random forests.
ENS( ensemble the above )
the ensemble is just a average probability forecast of the above three method.

