Legislators must disclose their financial interests. We made it easy to explore that data.

Jeremia Kimelman
CalMatters
Published in
4 min readSep 12, 2023

Our news team built and released a dataset of California legislators’ financial disclosures to improve government transparency and allow scrutiny of their actions and interests.

Every March, all 120 California legislators disclose financial information as required by law. That information includes stocks or properties that they own, any loans or gifts that they’ve received, and sponsored travel that occurred within the last year. Lawmakers submit documents in PDF form to the state’s Fair Political Practices Commission, which publishes them on a website. The site is clunky and hard to use. For example, it only permits viewers to look at a single document at a time regardless of how many tabs you have open.

Each year news outlets around the state invest time and energy in combing through all of those reports and pulling out a specific piece of information, such as which legislator got the most gifts or who attended a particular event. After all, this data is a major piece of the influence equation in Sacramento.

But that kind of one-off analysis is expensive and requires going back through all of the PDF documents to tally up a different figure for each question. We decided it would be better to have spreadsheets containing all of the data so we could do things like get the total dollar amount, figure out who took the most and, certainly, who spent the most.

In May, we released just that: a regularly updated dataset of financial disclosure information as a collection of spreadsheets for anybody to use. We’ve already used it for two stories: one about the organization that sponsored the most legislator travel in 2022, and one about a financial disclosure form that trip organizers and funders are supposed to submit but that almost nobody does.

We published the data to make it easier and less costly for any journalist or researcher to do their own analysis. It wasn’t easy to compile. It took more than 100 hours of labor between building custom data management software and paying half a dozen Sacramento State University college journalists to come to the CalMatters newsroom and extract the information trapped in the PDFs.

The actual disclosure document is a Form 700, and it’s based on a relatively complex data model. There are sections, called “schedules,” for each type of asset or income source that can contain an unlimited number of items. That makes sense, as we want to know about all of the stocks a legislator might own instead of just the top 10 or 20. But because the data is nested in this way, we knew that inputting it directly into a spreadsheet would be difficult to manage. Even “data people” can only edit a tens-of-columns and hundreds-of-rows document for so long before they make a mistake.

Instead, we built a custom web application to enter the data. (You can try it in read-only mode to see how it works.)

I’d like to take a short digression to discuss the technical tool choices we made for the application. The major one is SvelteKit, a component-based framework for building web applications. We rely heavily on Svelte and SvelteKit at CalMatters because we like how fast the apps are to make, how fast they load for readers, and, while this isn’t Svelte specific, having an internal collection of components means we can quickly build graphics and interactives. But news apps don’t do much good if nobody can access them, so we deploy everything to Netlify because it’s easy to hook up to Github, making sure the live version matches the most recent code version.

The point of this application is to improve the accuracy of the inputted data by structuring the inputs to match the form. But we need to store the database somewhere that’s reliable. And we need a tool that makes it easy enough to get data in and out of that database. We’re journalists, after all, not system administrators. We used Prisma as our database query library and Supabase as the database cloud provider. We created a relatively simple application that has just three database tables and mainly just saves the form contents as JSON.

We downloaded all of the financial disclosure forms for the last two years and added them to the app. Then a team composed of CalMatters staff and Sacramento State student journalists went through every PDF and typed out the information. Next, a different person went through each form to make sure the entered data matched the PDF.

These forms are submitted by over a hundred different people, so there are going to be typos, misspellings, and different acronyms. When we visually extracted the data from the PDF we did our best to include any of these differences so that our data matched the source for easier verification. But the data has to be cleaned up before it can be incorporated in any sort of real analysis. For that we use a set of scripts and a second set of spreadsheets that translate a name variation to a standardized one.

For example, one might spell AT&T as “ATT” or “A T and T” or something totally different. Our spreadsheet declares that if AT&T is spelled one of these many other ways then just change it to “AT&T”. Now you can just open the spreadsheet for gifts to legislators, and determine just how much money the company spent for state legislators in 2022. By the way, the answer is $1,010.40.

We’ve already written stories based on this data and have included it in our legislator tracking tool called Glass House. We plan to add in this year’s data when it’s due in March. But we believe that there are a bunch more stories in those CSVs, so if you end up using this financial disclosure data for anything, we’d really love to hear about it.

--

--

Jeremia Kimelman
CalMatters

Journalist @calmatters , previously at @columbiajou , @nbcnews , @18F , @codeforamerica . Fermentation & compost enthusiast. He/him.