Exploiting Political Insider Trading using code: A Deep Dive with Selenium Scraping

Sam Bennett
6 min readJun 26, 2023

--

Yesterday, I stumbled upon a fascinating YouTube video by Johnny Harris. It shed light on an intriguing subject — the trading habits of US politicians. These trades, often informed by insider knowledge, seemed to be a hidden facet of political life. The tone of the video and the articles Harris referenced stirred up a sense of unfairness in me. I couldn’t help but wonder, could there be a way to find the trades they are making?

This quest led me to the STOCK Act, a law passed by Obama in 2012. This act mandates that the trading activities of congress members be documented and disclosed to the public — ie. to me.

Driven by this thought, I delved into the U.S. House of Representatives’ Office of the Clerk database. After some time navigating this vast reservoir of information, I concluded that these trade records could be extracted using web scraping. The journey towards a more transparent trading landscape had just begun. Despite this, it’s important to tread cautiously when considering web scraping. Not all websites condone this practice and it’s crucial to respect the site’s robots.txt file, privacy policies, and terms of service. Breaching these could lead to legal implications and it’s always best to seek permission where necessary. In this project I was unable to find the robots.txt file and thus decided to just carry on, however, I would warn others against this practice and seek legal advice (however as a student I cannot personally afford this.)

Starting on this project, I initially thought to use Beautiful Soup, a tool that had been instrumental in my past work, such as the project I shared on Medium: https://medium.com/me/stats/post/880b02069e46. However, the complexity of this task soon revealed itself to be beyond what Beautiful Soup could handle.

Recognizing this, I made a strategic shift in my approach. I decided to learn and use Selenium, a more advanced tool seemingly better suited for the intricacies of this task. Thus, the project turned into not only an exploration of trade records but also a journey of expanding my technical skills.

The initial phase of the project involved navigating to the U.S. House of Representatives’ Office of the Clerk database and locating the search box. The snippet of code below illustrates this initial step.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
import time
import pickle # To save and load the last top URL


options= Options()
options.add_experimental_option("detach",True)


# Initialize the Chrome Driver
driver = webdriver.Chrome(service=Service('location of chromedriver),options=options) # Update this path

# Open the webpage
driver.get("https://disclosures-clerk.house.gov/PublicDisclosure/FinancialDisclosure")

Search_reports = driver.find_element("xpath",
"//a[text()='Search Reports']")

Search_reports.click()

time.sleep(2)

# Find the search box (replace 'search_box' with the actual name)
search_box = driver.find_element("xpath",
"//input[@name='LastName']") 'search_box'

names = ['Pelosi','Greene','Khanna'] # Politicians used

After the initial setup, I moved on to the main part of the task. My goal was to target specific politicians — those mentioned in the Johnny Harris video and whose data was available in the U.S. House of Representatives’ Office of the Clerk database. I developed a for loop to automate this process. The code enters the politician’s name in the search box, sorts the search results to display the newest trades first, and identifies the Periodic Transaction Reports.

import requests
from PyPDF2 import PdfMerger
import time
import os

for name in names:
# Enter the search string

search_box.clear()
search_box.send_keys(name)

# Submit the search
search_box.send_keys(Keys.RETURN)

# Wait for the page to load
time.sleep(2)

# Click the 'Filing Year' column twice
#filing_year_column = driver.find_element('Filing Year') # Update 'filing_year_column'
filing_year_column = driver.find_element("xpath",
"//th[text()='Filing Year']")
filing_year_column.click()
time.sleep(1)
filing_year_column.click()

# Wait for the page to load
time.sleep(2)

reports = driver.find_elements("xpath",
"//a[contains(@href, '/public_disc/ptr-pdfs')]")

My original idea was to output all the trades as text in the terminal, however when showing my parents what I had been doing for the last 3 hours, they seemed much more interested in the pdf being generated than simply just text, thus I decided that for each politician I would simply download and merge their Periodic Transaction Reports from the last 2 years into a single pdf for each politician. The code snippet below illustrates the setup for this process. I included a feature to check if new trades have been completed, determining if re-downloading all PDFs is necessary. This check was implemented using the Pickle module, as suggested by ChatGPT.

def download_pdf(url, destination):
response = requests.get(url)
with open(destination, 'wb') as output_file:
output_file.write(response.content)

# Initialize a PDF merger
merger = PdfMerger()

# List to keep track of downloaded PDF file paths
pdf_file_paths = []

# Try to load the last top URL from a file
try:
with open('last_top_url.pkl', 'rb') as file:
last_top_url = pickle.load(file)
except FileNotFoundError:
last_top_url = None # If the file doesn't exist, set the last top URL to None

# Get the top URL from the current run
current_top_url = reports[0].get_attribute('href')

# If the top URL hasn't changed, exit the script
if current_top_url == last_top_url:
print('No new PDFs have been added.')
driver.quit()
exit()

# If the top URL has changed, save it for the next run
with open('last_top_url.pkl', 'wb') as file:
pickle.dump(current_top_url, file)

Next, I wrote a for loop to open all the downloaded PDFs and create a merged document of all trades for each politician from the past two years. The code snippet below demonstrates this.

# Loop over each report link
for i, report in enumerate(reports):
# Click the link
report.click()

# Switch to the new tab
driver.switch_to.window(driver.window_handles[1])

# Get the URL of the PDF
url = driver.current_url

# Download the PDF
destination = f'folder_location/{name}{i}.pdf' # replace with your path
download_pdf(url, destination)

# Add the PDF to the merger
merger.append(destination)

# Add the file path to the list
pdf_file_paths.append(destination)

# Close the current tab
driver.close()

# Switch back to the original tab
driver.switch_to.window(driver.window_handles[0])

# Wait a bit before the next download
time.sleep(2)


final_destination = f'merged_pdf_name/{name}.pdf'

# Merge all the PDFs into one
merger.write(final_destination) # replace with your path
merger.close()
# Delete all downloaded PDFs
for pdf_file_path in pdf_file_paths:
os.remove(pdf_file_path)

# Close the browser
driver.quit()

The next steps — if ethical and legal — are to feed the trades into ChatGPT and to use its web browsing feature to identify links between the politician and companies traded; this is to identify any trades in which they may have additional information or if any of the trades oppose any of their public statements. Once the stocks have been filtered, a trading bot could be tested that copies the trades that remain after filtration. However, as aforementioned although this would be an interesting exercise, I am unsure about the ethical validity of this plan.

Looking ahead, there are intriguing possibilities to explore, provided they fall within ethical and legal boundaries. One such prospect is to utilize the capabilities of ChatGPT, specifically its web browsing feature. The idea is to determine any potential connections between the politicians and the companies they’ve traded with. The goal is twofold: to discover if any trades were conducted based on privileged information, and to highlight if any trades contradict their public statements.

Once these filters are applied, we’re left with a distilled list of trades. These could potentially serve as the blueprint for a trading bot, which could mimic the remaining trades. Using my previous projects,e.g https://medium.com/@samjgbennett/predicting-financial-market-prices-with-xgboost-and-kalman-filters-strategy-5d9c801bceea, it probably wouldn’t take too much time to create a bot that operates based on the trading habits of politicians.

However, I must reiterate my reservations. While this exercise could be insightful, it raises questions about its ethical implications about whether this is useful for the world or fun. For me, I enjoy the mathematical and statistical challenge of trading more than a cool trick that generated income by copying others. As with anything, it has also been better coded and documented by some kids on TikTok. The TikToks targeting Nancy Pelosi's finances specifically are now infamous amongst those in the finance meme community.

And thus, my journey into the world of political stock trading ended. A project that started with a video and a sense of curiosity had blossomed into a complex web scraping operation. From learning a new tool like Selenium to creating a system that automatically fetches, downloads, and consolidates two years of political trade data — the journey had been both challenging and rewarding.

As I scrolled through the final PDFs, I felt a sense of accomplishment. The PDFs were more than just documents; they were windows into a world of information previously hidden behind layers of complexity.

This project was not just about creating a tool to monitor political trades. It was about making information more accessible, about shedding light for myself on the opaque world of political trading — something that yesterday I had no idea had even existed.

--

--

Sam Bennett

3rd year statistics student. Sharing my interests & personal projects