UC Berkeley Graduate Students Leverage Natural Language Processing to Create Investigative Dashboard Tracking Insider Trading

The Fall 2023 5th Year MIDS Capstone Award-winning project PoliWatch aims to keep an eye on politicians and their stocks

Berkeley I School


5th Year Master of Information and Data Science (MIDS) students Matthew Dodd, Aditya Shah, Jocelyn Thai, and Ethan Yen are the winners of the 5th Year MIDS Capstone Award for their project, PoliWatch.

Their project addresses a common issue in Congress: insider trading. Despite a fifth of trades by Congress being reported as problematic, little has been done to hold politicians accountable. To gather information on stock trades, the team built a dashboard tracking congressional members’ trading activities and contextualizes it with their committee assignments, attended hearings, and sponsored legislation.

We spoke to the team to learn more —

From left to right, top to bottom: Jocelyn Thai, Ethan Yen, Matthew Dodd, and Aditya Shah

What inspired your project?

Ethan: Over the past couple of years, I have read several stories about congressional insider trading. These reports all follow a similar structure: (1) the journalist meticulously documents how a congressional member’s insider information may have informed their trades, (2) an ethics investigation (sometimes involving the Securities and Exchange Commission and Department of Justice) is launched, (3) charges are dropped. The final stage of the trajectory always confuses the reader — how can such damning evidence result in no charges? Moreover, how many other congressional members get away with trading with conflicts of interest since investigations are often conducted after public outrage?

These type of stories have dampened my general faith in government, especially since I have felt hopeless to do anything about the problem. The MIDS Capstone presented the perfect opportunity to apply data science in a truly novel way — to tackle important problems that do not initially present as data problems. The hurdle of novelty meant that our team spent a substantial amount of time on data engineering to ensure that we could translate our social problem into a quantitative data objective.

What was the timeline or process like from concept to final project?

Matthew: Our project started out looking to build a classifier identifying insider trading with traditional machine learning approaches. We struggled given the pretty low incidence of actual insider trading investigations, let alone convictions. With our pivot to building a research tool that can supplement investigations, our focus moved to data engineering and data validation. We evaluated multiple sources tracking congressional activity, often comparing results to ensure validity. This was a significant task and involved joining messy data from multiple sources, often requiring manual inspections and corrections. For example, we had to manually correct over 100 misspellings or abbreviations in congressperson names from official congressional documentation. Ultimately, we built a diverse, gold-standard corpus of congressional activity that had never before been assembled. This left us with only a few weeks to experiment with and optimize the final transformer model we used to surface relevant congressional activity to trades.

How did you work as a team? How did you work together as members of an online degree program?

Ethan: Subsets of our capstone team have worked together in various classes throughout the program, so going into the capstone, we were not meeting for the first time. In fact, the biggest priority in team formation was personality fit.

Once our team formed, we were quick to delineate roles. By focusing our tasks, we were each able to have separate weeks with downtime. This downtime was important as we were all working students. We set up weekly meetings to touch base and share any blocks we experienced. This group forum allowed us to talk through specific problems and also provided the opportunity to pivot as needed.

How did your I School curriculum help prepare you for this project?

Aditya: The MIDS curriculum meticulously offers a diverse spectrum of technical and non-technical courses, uniquely designed to be exceptionally practical, industry-focused, and aligned with the latest trends in the dynamic landscape of data science. These courses go beyond imparting essential skills for excelling as versatile data scientists; they serve as a dynamic springboard propelling us toward leadership roles in the evolving field of data science. Our unwavering belief is founded on the recognition that MIDS has played a pivotal role in endowing us with holistic expertise. This empowerment enables us not only to adeptly construct, lead, and scale data science products but also to craft solutions with actionable precision, effectively tackling real-world challenges.

By combining rigorous research, iterative improvements, and strategic networking, we are confident in our ability to not only understand the complexities of insider trading but also to position our MVP for success in the market.

Do you have any future plans for the project?

Aditya: We believe that PoliWatch has the potential to succeed as a business. Our journey has been characterized by significant milestones, and the recent triumph in winning the 5th-Year MIDS Capstone Project Award is a testament to our dedication and hard work. The continuous guidance and support from our esteemed capstone instructors — industry veterans Joyce J. Shen and Kevin Hartman — have played a pivotal role in guiding and supporting us, leading to the implementation of our fully functioning MVP (Minimum Viable Product). As a next step, we plan to apply to the Berkeley SkyDeck startup accelerator program.

A standout moment in our journey was the uplifting feedback from our capstone judges. Their emphasis on PoliWatch’s high potential impact and being the ‘closest to market’ among showcased projects is both encouraging and inspiring. This acknowledgment not only reinforces the commercial viability of our project but also highlights the strength of our core technical implementation, which is already in place.

Furthermore, the judges astutely pointed out that PoliWatch is addressing a problem often acknowledged but rarely acted upon, indicating a solid niche for our project. This insightful feedback resonates deeply with our commitment to making a meaningful impact in a space that requires attention and action.

As we navigate this exciting phase, we remain grateful for the support and guidance that have brought us to this point. We look forward to the possibilities that lie ahead and appreciate the valuable feedback that shapes our journey.

How could this project make an impact, or, who will it serve?

Ethan: Our product hopes to disrupt the space of congressional insider trading. Our tool serves investigative bodies whose analyses are often done via manual brute force methods. We hope our product provides regulators with a scalable approach to compliance.

Moreover, by demonstrating how the STOCK Act (Stop Trading on Congressional Knowledge (STOCK) Act of 2012) disclosures can be used to hold congressional members accountable for their trades, we hope to motivate regulatory bodies to increase enforcement of disclosure rules. Currently, violating STOCK Act disclosure rules incurs a trivial $200 fine, mostly because such disclosures are often inactionable. Our tool demonstrates that data methods can help make these disclosures actionable, justifying their collection and necessitating their integrity.

Additional info to share?

Aditya: As a cohesive team, our current commitment revolves around gaining a deeper understanding of the intricacies surrounding insider trading. To achieve this, we are diligently conducting comprehensive market research within a well-structured framework. Our goal is to pinpoint and isolate the core issues, enabling us to iteratively and continuously enhance the technical implementation of our MVP. This iterative process is crucial for achieving and sustaining product-market fit (PMF).

Simultaneously, in addition to our dedicated market research endeavors, we are actively seeking connections with angel investors and early-stage pre-seed venture capitalists. This strategic outreach is geared towards securing valuable support and resources to fuel our mission. Furthermore, we are exploring potential opportunities offered by various startup accelerator programs, aiming to leverage their expertise and networks for accelerated growth.

By combining rigorous research, iterative improvements, and strategic networking, we are confident in our ability to not only understand the complexities of insider trading but also to position our MVP for success in the market.



Berkeley I School

The UC Berkeley School of Information is a multi-disciplinary program devoted to enhancing the accessibility, usability, credibility & security of information.