How do Robinhood Users Behave?

Joseph Bell
5 min readJan 9, 2020

--

Robinhood is a free to trade online brokerage. They pioneered zero commission trading. Instead of charging commissions they make money by selling order flow. It’s really easy to open an account with a mobile phone. A lot of Robinhood users are millennials who are new to trading and investing and haven’t been through a full market cycle. So, how do Robinhood users behave as a whole and in each individual stock they own?

Robintrack.net tracks the number of Robinhood users for a particular stock vs. the price of the stock and provides a neat little graph comparing the two. Robintrack.net tracks over 8900 stocks that Robinhood users can chose from. For simplicity’s sake I decided to look at S&P 500 stocks only, from the time Robintrack.net began collecting data on May 2, 2018 until the end of 2019.

Here is a look at the percentage change in users versus the percentage change in price for all the S&P 500 stocks over the period.

It’s tough to tell if there’s any relationship here because of the upward bias in stock prices and the growth in Robinhood users over the last few years. The data will be much more interesting when growth in users levels off and there is a major 2008-like market correction.

However, it appears that when a stock actually declines in today’s bull market, Robinhood users step in to buy in droves. Some might be familiar with the term ‘‘buying the dip.’’ Traders buy a stock that tanks with the hopes of selling it as its price mean reverts.

Zooming in on where the stock price has significantly declined you can see plenty of data points where the users holding has significantly increased.

The decline in Kraft Heinz (KHC) is an example of dip buying. The stock fell 33% while the number of Robinhood users holding nearly tripled.

There are a fair amount of cases where the number of Robinhood users holding a stock declined, most of them earning a decent percentage gain as they sold.

The rise in Chipotle Mexican Grill (CMG) is an example of traders taking profits with a strong negative correlation of -0.77.

Again the data is skewed up and to the right like most stocks charts in the past years. It would be an interesting exercise to look at the other 8400 stocks not pictured here. While the S&P 500 stocks are the largest, they are always not the most popular with Robinhood users.

Robinhood users might buy stocks that are “cheap” on a per share price basis i.e. a share price < $10. They also own a lot of story stocks related to fake meat, electric cars, and the legalization of marijuana. Perhaps I’ll explore that next.

Technical notes:

Combining the data was a chore as the Robintrack popularity data had a different csv file for each ticker . With ~14,000 rows and ~8900 tickers there were over 100 million rows when concatenating the data. I needed to use an environment other than Google Colab due to the size of the dataset. For the following steps I used Spyder and Sublime Text, then read in the final csv files into Colab.

First I separated the data from the timestamp and added a column containing the symbol for each csv file name to make it easier to merge with the pricing data later on. I also removed all of the duplicate dates because even though Robintrack takes multiple data points per day, we really only need one a day to match the closing price for the day. This reduced the rows for each ticker to 415 (the number of trading days in the period May 2, 2018 to December 31, 2019).

import pandas as pd
import os
import glob
def cleanPopularity(indir='file path where Robintrack data is'):
os.chdir(indir)
fileList = glob.glob('*')
for filename in fileList:
df = pd.read_csv(filename)
# using csv file name and creating a new column 'Symbol'
df['Symbol'] = os.path.splitext(os.path.basename(filename))[0]
# separating date and timestamp
new = df["timestamp"].str.split(" ", n=1, expand=True)
df["Date"] = new[0]
df["Time"] = new[1]
df.drop(columns=['timestamp'], inplace=True)
# dropping duplicate dates
df.drop_duplicates(subset='Date', keep='first', inplace=True)
df.drop(columns=['Time'], inplace=True)
df.to_csv('cleaned' + filename, index=False,encoding='utf-8-sig')
cleanPopularity()

Then I combined all the files in one csv file.

import os
import glob
import pandas as pd
#set working directory
os.chdir('file path where cleaned popularity is')
#find all csv files in the folder
#use glob pattern matching -> extension = 'csv'
#save result in list -> all_filenames
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
#print(all_filenames)
#combine all files in the list
popularity_csv = pd.concat([pd.read_csv(f) for f in all_filenames])
#export to csv
popularity_csv.to_csv( "popularity_data.csv", index=False, encoding='utf-8-sig')

You can compile a list of symbols from the Robintrack file. This will return a list of ~8900 stocks. I used just the S&P 500 stocks and got the list by scraping them from wikipedia. You can view my code for web scraping for S&P on GitHub.

import os# generates a list of symbols in a python list from the csvs
stock_list = ['.'.join(x.split('.')[:-1]) for x in os.listdir("file path with Robintrack data") if os.path.isfile(os.path.join('file path with Robintrack data', x))]
# sorts the list alphabetically
stock_list.sort()
# removes empty string at beginning of list
stock_list.remove('')
print(stock_list)
print(len(stock_list))

Then I generated prices from Yahoo Finance. There is probably a better way to do this and I am open to suggestions.

import pandas_datareader.data as web
import datetime as dt
from datetime import datetime
import os
def get_stock_data():
# I used a list of the S&P 500 tickers here
tickers = ['FB', 'AAPL', 'AMZN', 'NFLX', 'GOOG', etc]
# set start and end time of data
start = dt.datetime(2018, 5, 2)
end = dt.datetime(2019, 12, 31)
if not os.path.exists('stockdata'):
os.makedirs('stockdata')

for ticker in tickers:
print(ticker)
try :
df = web.DataReader(ticker, 'yahoo', start, end).drop(['High','Low','Open','Adj Close'], axis=1)
# creates a new column with Symbol name to match later
df['Symbol'] = ticker
print(df.head())
df.to_csv('stockdata/{}.csv'. format(ticker))
print(ticker, 'downloaded')
except Exception as e:
print(e, 'error')
print(get_stock_data())

Join the pricing csvs.

import os
import glob
import pandas as pd
#set working directory
os.chdir('file path where stock price csv files are')
#find all csv files in the folder
#use glob pattern matching -> extension = 'csv'
#save result in list -> all_filenames
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
#print(all_filenames)
#combine all files in the list
pricing_csv = pd.concat([pd.read_csv(f) for f in all_filenames])
#export to csv
pricing_csv.to_csv( "combined_prices.csv", index=False, encoding='utf-8-sig')

The rest of this code for this article is available on GitHub where I merge the popularity and pricing data.

--

--

Joseph Bell

Data Scientist seeking an opportunity to use data to inform business and investing decisions.