Stock Portfolios Are The New Polls

Figure 1: Results of the election prediction where a red county implies a higher probability of a Trump win while a blue county corresponds to a higher probability of a Clinton win. The grey area does not comprise Robinhood users.

In the days leading up to the 2016 presidential election, the Robinhood data team started to see some interesting trading patterns on the platform, prompting us to explore whether or not our trading behavior could have predicted the election outcome with the election results in hindsight.

Of course, this is only possible given our reach — which we realized is bigger than we expected. Our one million users stretch all 50 states and span 2133 out of 3114 counties. With this, we prepared a dataset and monitored stocks bought and sold from November 1, 2016 until market close on November 8, 2016. For each symbol, the ratio of the dollar amount bought to the total dollar amount transacted (bought and sold) is computed (1).

Since there are over 8,000 symbols listed on Robinhood, we reduced the number of symbols and considered the top symbols traded in our dataset for each candidate. In the prediction, only the top 12 symbols ordered by the previously defined ratio (1) for each candidate are used as inputs to the model. The top symbols traded in counties where either Clinton or Trump won are listed in Table 1:

Table 1: The symbols traded in the week leading up to the election in counties where Trump and Clinton won. The companies which correspond to these symbols are listed in the Appendix.

Using the symbols and its ratio (1), a county level dataset is created. Each symbol and it’s corresponding ratio value (1) is an independent feature in this dataset while the binary output Clinton or Trump is the dependent variable.

Using a linear model (in this case, Logistic Regression) the probability of a Trump win is computed for each county. This is done by splitting the dataset into a test and train set. The weights obtained from the training set are used to predict probabilities in the test set. In order to get a predicted probability for each county by deriving weights from an independent dataset, a four fold cross validation method is used. This means that at any given time, 75% of the dataset is used to predict the outcome in the other 25%. The results of the cross-validation method are summarized in Figure 2. The receiver operating characteristic (ROC) curve is shown for each fold along with the mean for all four folds. The area under the ROC curve (AUC) varies between 0.75 and 0.78 depending on the fold. The mean AUC for the four predictions is 0.77. If class labels are randomly assigned to the dataset it would result in a ROC AUC of 0.5 as shown by the red dotted line.

Figure 2: The Receiver Operating Characteristic curve for the four fold cross validation method. The mean of the four folds is shown in black dotted line and the random prediction in red dotted line

In order to minimize the false positive and false negative rate, a predicted probability value of 0.65 or greater (in each county) is assigned the label “Trump” as victorious while a probability of less than 0.65 is assigned “Clinton.” Using these class labels, 1839 counties out of 2133 are predicted correctly which corresponds to a true classification rate of 86.3%. Out of the 2133 counties in the Robinhood data, Trump was victorious in 1766 counties. If the winner in all counties was assigned to be Trump, the mis-classification rate would be 17.2%. This shows that the model does better than a random classification.

In other words, trading patterns can be effective in predicting voting decisions and election outcomes.

Figure 1 shows the results of the prediction as compared to the real outcome in the election seen in Figure 3. The red regions represent a Trump victory while the blue regions represent a Clinton victory. There are several grey regions in Figure 1. This is due to the fact that Robinhood users do not span every county in the United States.

Disclosure: Data was anonymized and aggregated at the county level for this analysis.

Figure 3: Election results where the redder the county, the higher count of votes were won by Trump while a bluer county corresponds to a Clinton win.

Appendix:

  1. AAPL : Apple Inc.
  2. AMD : Advanced Micro Devices Inc.
  3. AAME : Atlantic American Corp.
  4. AMZN : Amazon Inc.
  5. FB: Facebook Inc.
  6. SPY : Standard and Poor’s S&P 500 ETF
  7. F : Ford Motor Company
  8. SWHC : Smith and Wesson Holding Corp.
  9. SLB : Schlumberger Limited
  10. UWTI : Velocity Shares 3x Long Crude
  11. FIT : FitBit Inc.
  12. GPRO : GoPro Inc.
  13. LMT : Lockheed Martin Corp.
  14. XIV : Velocity Shares Daily Inverse Short Term ETN
  15. TVIX : Velocity Shares Daily 2x VIX Short Term ETN
  16. DUST : Daily Gold Miners Bear 3x
  17. JNUG : Daily Gold Miners Bull 3x
  18. NUGT : Daily Gold Miners Bull 3x
  19. NVDA : NVIDIA Corp.
  20. FSLR : First Solar Inc.
  21. TSLA : Tesla Motors
  22. UVXY : ProShares Trust Ultra VIX Short Term
  23. AMRS : Amyris Biotechnologies Inc.