Stock Portfolios Are The New Polls
In the days leading up to the 2016 presidential election, the Robinhood data team started to see some interesting trading patterns on the platform, prompting us to explore whether or not our trading behavior could have predicted the election outcome with the election results in hindsight.
Of course, this is only possible given our reach — which we realized is bigger than we expected. Our one million users stretch all 50 states and span 2133 out of 3114 counties. With this, we prepared a dataset and monitored stocks bought and sold from November 1, 2016 until market close on November 8, 2016. For each symbol, the ratio of the dollar amount bought to the total dollar amount transacted (bought and sold) is computed (1).
Since there are over 8,000 symbols listed on Robinhood, we reduced the number of symbols and considered the top symbols traded in our dataset for each candidate. In the prediction, only the top 12 symbols ordered by the previously defined ratio (1) for each candidate are used as inputs to the model. The top symbols traded in counties where either Clinton or Trump won are listed in Table 1:
Using the symbols and its ratio (1), a county level dataset is created. Each symbol and it’s corresponding ratio value (1) is an independent feature in this dataset while the binary output Clinton or Trump is the dependent variable.
Using a linear model (in this case, Logistic Regression) the probability of a Trump win is computed for each county. This is done by splitting the dataset into a test and train set. The weights obtained from the training set are used to predict probabilities in the test set. In order to get a predicted probability for each county by deriving weights from an independent dataset, a four fold cross validation method is used. This means that at any given time, 75% of the dataset is used to predict the outcome in the other 25%. The results of the cross-validation method are summarized in Figure 2. The receiver operating characteristic (ROC) curve is shown for each fold along with the mean for all four folds. The area under the ROC curve (AUC) varies between 0.75 and 0.78 depending on the fold. The mean AUC for the four predictions is 0.77. If class labels are randomly assigned to the dataset it would result in a ROC AUC of 0.5 as shown by the red dotted line.
In order to minimize the false positive and false negative rate, a predicted probability value of 0.65 or greater (in each county) is assigned the label “Trump” as victorious while a probability of less than 0.65 is assigned “Clinton.” Using these class labels, 1839 counties out of 2133 are predicted correctly which corresponds to a true classification rate of 86.3%. Out of the 2133 counties in the Robinhood data, Trump was victorious in 1766 counties. If the winner in all counties was assigned to be Trump, the mis-classification rate would be 17.2%. This shows that the model does better than a random classification.
In other words, trading patterns can be effective in predicting voting decisions and election outcomes.
Figure 1 shows the results of the prediction as compared to the real outcome in the election seen in Figure 3. The red regions represent a Trump victory while the blue regions represent a Clinton victory. There are several grey regions in Figure 1. This is due to the fact that Robinhood users do not span every county in the United States.
Disclosure: Data was anonymized and aggregated at the county level for this analysis.
- AAPL : Apple Inc.
- AMD : Advanced Micro Devices Inc.
- AAME : Atlantic American Corp.
- AMZN : Amazon Inc.
- FB: Facebook Inc.
- SPY : Standard and Poor’s S&P 500 ETF
- F : Ford Motor Company
- SWHC : Smith and Wesson Holding Corp.
- SLB : Schlumberger Limited
- UWTI : Velocity Shares 3x Long Crude
- FIT : FitBit Inc.
- GPRO : GoPro Inc.
- LMT : Lockheed Martin Corp.
- XIV : Velocity Shares Daily Inverse Short Term ETN
- TVIX : Velocity Shares Daily 2x VIX Short Term ETN
- DUST : Daily Gold Miners Bear 3x
- JNUG : Daily Gold Miners Bull 3x
- NUGT : Daily Gold Miners Bull 3x
- NVDA : NVIDIA Corp.
- FSLR : First Solar Inc.
- TSLA : Tesla Motors
- UVXY : ProShares Trust Ultra VIX Short Term
- AMRS : Amyris Biotechnologies Inc.