Exploring the Profitability of Insider Trading for US Executives

11 min readApr 29, 2023

The project code and data are available in this GitHub repository.

Executive Summary

The purpose of this article is to investigate whether the average US executive benefits from trading their own company stock, and to determine the conditions under which this is true.

To accomplish this, we collected insider trading data for 3700 NASDAQ-listed companies through web scraping from OpenInsider. We generated trading signals (Buy/Sell) from this data, with both positions held for 5 days, and then simulated trades using Backtrader. Historical price data from Yahoo Finance was used for this simulation, which took place between 01/08/2017 and 01/08/2021.

For simplicity, we did not take into account transaction fees, taxes, or dividends. Our expectation was that results would differ based on a company’s market capitalization and industry. In our analysis, we found that trading company stock was profitable for firms in the mega market capitalization category (> 300%), but we were unable to account for the impact of external variables.

Limitations

As executives’ compensation packages frequently include stock options, much of their insider trading activity may not be driven by insider information or speculation. Rather, it could be due to their personal financial needs. Unfortunately, it is not possible to account for the motivations of individual managers.

Additionally, some might argue that, in an ideal market where trading information is easily accessible, all of these transactions are already priced in, especially in the age of high-frequency trading.

Applications

These findings may serve as a signal in more sophisticated trading algorithms. This strategy could be even more effective if insider trading occurs frequently.

The following table presents the best and worst performers of this strategy:

Table of best and worst performing tickers.

I. Background

Since 1934, the US Securities and Exchange Commission has mandated that company personnel owning more than 10% of the company’s equity must file a report within 48 hours of buying or selling company stocks. This data is publicly available in the SEC database. The goal of this article is to explore potential strategies that can be developed using this information.

II. Collecting Insider Trading Data

Fortunately, there are several websites available that track insider trading information for publicly listed companies. Although a free, publicly available API was not found, we can use the Python Pandas read_html function for web scraping. This data can be accessed from OpenInsider.com and is displayed as follows:

In a separate file, methods.py, we define several methods, including:

fetch_insider_data(ticker): This method saves insider trading data in a .csv format for each company ticker from OpenInsider.
fetch_sp500(): This method returns a Pandas data frame with a list of SP500 company tickers.
populate_data(): This method fetches insider data for all SP500 tickers.
grid_search(): This method collects the sales and purchase dates of stocks from the collected data and returns a list of sales and buys with dates. These will be used as trading signals during the testing phase. If unsuccessful, the method returns two empty lists.
backtest_strat(sell_dates, buy_dates): This method serves as a checkpoint to examine the data being used.
get_strategy(): This method will be passed to the testing phase, and if called, will initiate the entire process.

import pandas as pd
import os


def fetch_insider_data(ticker, path):
    """
    Fetches insider trading data for a given company ticker from OpenInsider.com and saves it in a .csv format
    in the given path directory.
    
    Args:
    ticker (str): The company ticker symbol.
    path (str): The directory path where the .csv file will be saved.
    
    Returns:
    None
    """

    try:
        insider = pd.DataFrame()

        insider1 = pd.read_html(
            f'http://openinsider.com/screener?s={ticker}&o=&pl=&ph=&ll=&lh=&fd=1461&fdr=&td=0&tdr=&fdlyl=&fdlyh=&daysago=&xp=1&xs=1&vl=&vh=&ocl=&och=&sic1=-1&sicl=100&sich=9999&grp=0&nfl=&nfh=&nil=&nih=&nol=&noh=&v2l=&v2h=&oc2l=&oc2h=&sortcol=0&cnt=1000&page=1')
        insider1 = insider1[-3]
        insider1['company'] = ticker
        insider = pd.concat([insider, insider1])
        insider.to_csv(path+ticker+".csv")
    except Exception as e:
        print(str(e))
        pass


def fetch_sp500():
    """
    Fetches the list of S&P 500 company tickers from Wikipedia and returns it as a Pandas DataFrame.
    
    Args:
    None
    
    Returns:
    df (pd.DataFrame): A DataFrame containing the list of S&P 500 company tickers.
    """

    df = pd.read_html(
        'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0]
    return df


def populate_data():
    """
    Fetches insider trading data for all S&P 500 company tickers using fetch_insider_data and saves them
    in the corresponding directory.
    
    Args:
    None
    
    Returns:
    None
    """

    for symbol in fetch_sp500()["Symbol"]:
        try:
            fetch_insider_data(symbol)
        except Exception as e:
            print(str(e))
            pass


def gridsearch(ticker, mcap):
    """
    Searches for sales and purchase dates of stocks from the collected insider trading data for a given ticker and
    market capitalization and returns a tuple of two lists containing the sell and buy dates respectively.
    
    Args:
    ticker (str): The company ticker symbol.
    mcap (str): The market capitalization of the company.
    
    Returns:
    A tuple of two lists containing the sell and buy dates respectively.
    """

    try:
        df = pd.read_csv(f"by_mcap//{mcap}//{ticker}.csv", index_col=0)
        df.columns = ['x', 'filing_date', 'trade_date', 'ticker', 'insider_name', 'title',
                      'trade_type', 'price', 'qty', 'owned', 'd_own', 'value', '1d', '1w',
                      '1m', '6m', 'company']

        sell_dates = df.loc[df["trade_type"].str.contains(
            "Sale")]["filing_date"]
        buy_dates = df.loc[df["trade_type"].str.contains(
            "Purchase")]["filing_date"]
        return list(sell_dates), list(buy_dates)

    except Exception as e:
        print("Error : "+str(e))
        pass
    return [], []


def backtest_strat(sell_dates, buy_dates):
    """
    Prints the list of sell and buy dates for the given strategy.
    
    Args:
    sell_dates (List[str]): A list of sell dates.
    buy_dates (List[str]): A list of buy dates.
    
    Returns:
    None
    """
    for s_date in sell_dates:
        print(f"SELL: {s_date}")
    for b_date in buy_dates:
        print(f"BUY: {b_date}")


def get_strategy(ticker):
    """
    Returns a tuple of two lists containing the sell and buy dates respectively for a given company ticker.
    
    Args:
    ticker (str): The company ticker symbol.
    
    Returns:
    A tuple of two lists containing the sell and buy dates respectively.
    """
    try:
        sell_dates, buy_dates = gridsearch(ticker)
        return sell_dates, buy_dates
    except Exception as e:
        print(str(e))
        pass

III. Backtesting strategies with Backtrader and Cerebro

Moving on to the testing phase, we will use Backtrader and Cerebro to simulate how well our insider trading signals perform if we buy/sell stocks accordingly. First, we acquire our historical test data from the Yahoo Finance API. The get_ticker_historic(ticker, offline) function reads and saves the historical price data in a Pandas dataframe object. We also have an optional argument, offline, which, if True, loads the data from the local directory instead of fetching it from the API.

Next, we define a boilerplate class for Backtrader and Cerebro called TestStrategy(). This class matches the historical [Adj. Close] data with the buy/sell signals and simulates trading for a specific time period. It’s important to note that we set the commission to zero since this is a purely theoretical test, and we want to see how well our trading signals perform without any real-world transaction costs.

An example of trading signals generated for AMZN in BackTrader.

IV. Testing with parameters

When running the algorithm for the first time, a negative 100% loss is expected as short positions are assumed to be held indefinitely. To fit all short positions, a brute-force grid search is done to find an arbitrarily short period.

After the first optimal parameter (short-period) is found, there are various ways to analyze the data. The top three variables to consider are:

The market capitalization of the underlying company: managers of higher market cap companies might not have much asymmetric information due to diseconomies of scale.
The sector the company operates in: measuring the sector a company operates in can have an effect on managerial behavior and asymmetric information.
Trade turnover: the amount of time a single stock of the companies is traded on a periodic basis.

To address the above variables, the following steps are taken:

Companies are divided into six market cap categories: mega, big, mid, small, micro, and nano.
Tickers are categorized by the industry they operate in.
A ratio of the volume of annual trade divided by shares outstanding is used to determine trade turnover.

The strategy_tester.py code iterates through all the tested tickers, creates trading signals with the functions defined in strategy_tester.py, and runs it through a Backtrader strategy class.

from __future__ import (absolute_import, division, print_function,
                        unicode_literals)
import os
import os.path
from pandas.core.arrays.categorical import Categorical
from methods import *
import pandas as pd
import backtrader as bt
import itertools
from collections import Counter
import pandas_datareader.data as web
import datetime as dt
import time
import threading
import plotly.graph_objects as go
from plotly import tools
import plotly.offline as py
import plotly.express as px

global short_period
short_period = 5


class pandasDataFeed(bt.feeds.PandasData):
    """
    A custom Pandas Data Feed that defines the data parameters required for
    backtesting with Backtrader. This inherits the bt.feeds.PandasData class.
    """

    params = (
        ('fromdate', dt.datetime(2020, 1, 1)),
        ('todate', dt.datetime.now()),
        ('dtformat', '%Y-%m-%d'),
        ('datetime', None),
        ('high', 'High'),
        ('low', 'Low'),
        ('open', 'Open'),
        ('close', 'Close'),
        ('volume', 'Volume')
    )


def get_ticker_historic(ticker, offline=True):
    """
    A function that returns the historical data of a given stock ticker. The data is
    obtained from a local CSV file if `offline` is True, otherwise it is obtained from
    Yahoo Finance using the pandas_datareader library.
    
    :param ticker: String, the stock ticker.
    :param offline: Boolean, whether or not to fetch data offline (from local CSV).
    :return: Pandas DataFrame object containing the historical stock data.
    """

    if offline == True:
        try:
            df = pd.read_csv(f"data//stock_data//{ticker}.csv",
                             parse_dates=True,
                             index_col=0)
            return df
        except Exception as e:
            print(str(e))
            try:
                start = dt.datetime(2017, 8, 1)
                end = dt.datetime(2021, 8, 1)
                df = web.DataReader(ticker, "yahoo", start, end)
                df.to_csv(f"data//stock_data//{ticker}.csv")
                print(f"Fetched data from yfinance for {ticker}")
                time.sleep(0.3)
                return df

            except Exception as e:
                print("Couldnt get data from yfinance:"+str(e))
                


class TestStrategy(bt.Strategy):
    """
    A class representing a Backtrader trading strategy. This class inherits from the
    Backtrader Strategy class, and is used to define the trading logic of the strategy.
    
    :param ticker: String, the stock ticker.
    :param mcap: Float, market cap of the company.
    """

    def log(self, txt, dt=None):
        dt = dt or self.datas[0].datetime.date(0)
        #print('%s, %s' % (dt.isoformat(), txt))

    def __init__(self):
        self.dataclose = self.datas[0].close
        self.short_close = []
        self.sell_d, self.buy_d = gridsearch(ticker, mcap)
        self.order = None
        self.short_period = 1
        self.tradeid = itertools.cycle([0, 1, 2])

    def next(self):
        dat = self.datas[0].datetime.date(0)

        if not self.position:
            for s_date in self.sell_d:
                if dt.datetime.strptime(s_date.split(" ")[0], '%Y-%m-%d').strftime('%Y-%m-%d') == dat.strftime('%Y-%m-%d'):
                    self.curtradeid = next(self.tradeid)
                    self.order = self.sell(
                        tradeid=self.curtradeid)
                    self.log(f"SELL CREATE {self.dataclose[0]:2f}")
            for b_date in self.buy_d:
                if dt.datetime.strptime(b_date.split(" ")[0], '%Y-%m-%d').strftime('%Y-%m-%d') == dat.strftime('%Y-%m-%d'):
                    self.curtradeid = next(self.tradeid)
                    self.order = self.buy(
                        tradeid=self.curtradeid)
                    self.log(f"BUY CREATE {self.dataclose[0]:2f}")
        else:
            if len(self) >= (self.bar_executed+self.short_period):
                self.log(f'CLOSE CREATE {self.dataclose[0]:2f}')
                self.order = self.close(tradeid=self.curtradeid)

    def notify_order(self, order):
        if order.status in [order.Submitted, order.Accepted]:
            return

        if order.status in [order.Completed]:
            if order.isbuy():
                self.log(f'BUY EXECUTED, {order.executed.price:.2f}')
            elif order.issell():
                self.log(f'SELL EXECUTED, {order.executed.price:.2f}')
            self.bar_executed = len(self)

        elif order.status in [order.Canceled,
                              order.Margin, order.Rejected]:
            self.log('Order Canceled/Margin/Rejected')

        self.order = None


def categorize_by_mcap(df=pd.read_csv("data//nasdaq_tickers.csv")):
    """
    Categorizes tickers based on market cap into 6 categories. This function takes in a
    DataFrame containing tickers and market cap values and returns nothing. It only
    needs to be called once.
    
    :param df: Pandas DataFrame, DataFrame containing tickers and market cap values.
    :return: None.
    """

    markcap_points = {
        "mega": [200, 999_999],
        "big": [10, 19999],
        "mid": [2, 9.999],
        "small": [0.3, 1.999],
        "micro": [0.05, 0.2999],
        "nano": [0, 0.0499]
    }

    def markcap_pointer(mc):
        for cat in markcap_points.keys():

            if float(mc/1_000_000_000) > markcap_points[cat][0] and float(mc/1_000_000_000) < markcap_points[cat][1]:
                return cat

    df["markcap_cat"] = df["Market Cap"].map(markcap_pointer)
    print(df["markcap_cat"].value_counts())


def test_ticker(file, current_dir):
    """
    Tests strategy with class TestStrategy.
    :param file: String, ticker of the company
    :param current_dir: String, the directory name, which is also the markcap category
    :return profit_loss: Float, percentage return on tested strategy
    """
    global ticker
    global mcap
    mcap = current_dir
    ticker = file
    data_feed = get_ticker_historic(ticker)
    cerebro = bt.Cerebro()
    cerebro.addstrategy(TestStrategy)

    if not data_feed is None:
        if "Adj Close" in data_feed.columns:
            try:
                data_feed = data_feed.drop(["Adj Close"])
            except:
                pass

        data = pandasDataFeed(dataname=data_feed)
        cerebro.adddata(data)
        cerebro.broker.setcash(1000.0)
        cerebro.broker.setcommission(commission=0.0)
        start_val = cerebro.broker.getvalue()
        cerebro.run()
        end_val = cerebro.broker.getvalue()

        profit_loss = (end_val-start_val)/start_val * 100
        print(f"Profit/loss: {profit_loss}")
        return profit_loss

After defining all neccessary methods, we can run the script on each of the 3700 NASDAQ listed stocks:


for current_dir in os.listdir("by_mcap"):
    print(current_dir+"\n\n\n")
    d = len(os.listdir("by_mcap//"+current_dir))
    df = pd.read_csv("data//nasdaq_tickers.csv")
    df_to_save = df[["Symbol", "Market Cap", "Industry"]]
    symbol_list = [f[:-4]
                   for f in os.listdir(f"by_mcap//{current_dir}")]
    df_to_save = df_to_save[df["Symbol"].isin(symbol_list)]
    print(df_to_save.reset_index())

    df_to_save["pl5"] = df_to_save["Symbol"].apply(
        lambda x: test_ticker(x, current_dir),)
    df_to_save.to_csv(f"{current_dir}_meta_pl.csv")

IV Results

IV a): Plotting by Industry

To analyze the effect of industry on our trading strategy, we can plot the returns by sector. We can also break down the returns by market capitalization, dividing the companies into six categories: mega, big, mid, small, micro, and nano.

After mapping our trading strategy to each category, we can plot the profit or loss as a percentage on a log scale. This allows us to see how the different market cap categories perform relative to each other.

We can also examine which sectors perform better or worse than others. This information can help us identify trends and adjust our strategy accordingly. The code in strategy_tester.py can be used to generate these plots.

Profit/Loss per Industry (most common 20)

IV b): Plotting market cap (USD) to log return (%)

Based on the bar chart, companies in the “big” market capitalization category performed the worst (~100% loss) while companeis in the “mega” category performed best (> 300% profit).

It’s important to note that this conclusion should be drawn with caution and should not be the only factor to consider when making investment decisions. There may be other factors at play that affect the performance of companies in different market capitalization categories. Additionally, past performance does not guarantee future results, so it’s important to do thorough research and analysis before making any investment decisions.

IV c): Plotting by trade_turnover

After calculating the trade turnover ratio for each company, we can analyze how it relates to the profit/loss of our trading strategy. The resulting scatter plot shows that there is no clear correlation between trade turnover and profit/loss.

This suggests that the level of trading activity of a company’s stock does not significantly impact the effectiveness of our insider trading strategy.

Plotting trade_turnover to log(profit)

V. Conclusion

The question this project aimed to answer was: Does the average US executive benefit from trading their own company stock, and if so, under what conditions?

Based on the collected data and trading signals, it is unlikely that on this level of analysis, any profitable trading strategy can be devised from this information alone. This suggests, that unlike some social media influencers and trading gurus suggest, if investments are made based on managers trading their company stock, strong negative results are expected in significant amount of cases.

While Mega Cap tickers yielded a >300 % return on average over the examined 3 years, accounting for brokerage fees, risk-free discount rate, and other factors, it is hard to tell how much of the trading gains were due to the right timing of the market.

Nevertheless, with a combination of the more complex weighting of trading signals and other indicators, this information could contribute to a profitable trading strategy.