In this article I describe the steps for obtaining and loading New York State Board of Elections data into an Amazon RDS MySQL instance. Loading this data into a relational database enables easy analysis of voter registration and turnout trends, and provides the opportunity for data driven insights into election results. The process could probably be adjusted for other states, but I haven’t looked into how the data differs state to state.
The Board of Elections public dataset includes all registered voters, their registered affiliation, and their voting history (what elections they voted in, not how they voted), as well as other data points. Larger campaigns that have access to more resources probably don’t need to start from this point, however, I wanted to pitch in and help in my small town election where voter registration & turnout efforts were based on spreadsheets (when they used data at all).
In New York, obtaining the Board of Elections data requires a FOIL request. Luckily, this process can be done online on the New York State Board of Elections website. (Look under ‘How to Request Access to Public Records).
Then, waiting. A couple weeks later, I received a disc in the mail.
The data was provided in a comma-separated text file. Included was a PDF describing the fields, which is also provided online.
Creating the Amazon RDS instance is the most intimidating part of this process, partially due to the extreme number of online tutorials, each of which is slightly different. But, just be aware, if you have some technical background, it really isn’t difficult.
From the Amazon RDS dashboard in AWS, choose ‘Create Database’
The next screen we are going to select the database engine. Any of them could be used (depending on your skillset), but I am most familar with MySQL (it is also free).
I also ticked the box ‘only free tier’ options, since this isn’t a production-ready application database, but rather only for doing queries & pulling lists, we don’t need the more complicated options.
The next step is to specify the database details. I kept everything the default, and just filled in the instance name, username and password. The process is quick, so you can always delete the instance and re-create it with a different name later.
Next is the Advanced settings. The main option here to change is to enable public access. This doesn’t mean anyone can get access (there is still a password), but it does mean the instance is provided an IP address such that we can connect to it from our home computer (rather than only another server in AWS as the case would be for a normal web application).
Then, click ‘Create Database’ and wait a few minutes. You can click to view the instance details while you wait. Once the instance is created, on the summary page you’ll see the address listed under ‘endpoint’. This is what you are going to use to connect.
I prefer to use the tool SequelPro for MySQL work, but any MySQL client tool will work. Simply enter the endpoint above, and the username & password you provided. One note is that, by default, AWS will use a secure method of reading your IP address and limiting connections to only that IP. So if you know you will be connecting from another IP, you’ll need to modify your security group. Keeping a database publically accessible like this is not good practice, even if it is password protected. However, we are only processing public data anyway, so its not a huge problem.
Now that we have the database created, lets get the data imported! Initially I went through an iterative process of importing, it failing, then tweaking the MySQL table structure, and retrying. I figured out something that worked eventually, but a MySQL expert could probably have done it much faster. To speed things up (and if you are in NY), you can use this SQL file here to create your table. Other states might need to tweak the file for their specific case, but it should be easy to change.
From SequelPro, choose ‘Import’, format ‘SQL’.
Now, we have a table created, in the correct format for our data, let’s import the content! Again, do a ‘Import’, format ‘CSV’. The NY State BOE data doesn’t include column headings in the data file, so I unticked that box.
Next we will get options for mapping the columns to our table structure. If you are in NY state, this should be already setup based on my table structure, but may need some tweaking for other datasets.
Once you click ‘Import’ the data will start flowing. If you had to tweak the table structure you might get some errors, so keep an eye on the progress. As I said, initially it was an iterative process.
With that, the data should be imported and now you can begin running MySQL queries to pull lists to help canvasing efforts. Some interesting ideas:
- Registered voters who haven’t voted recently
- Registered voters who only vote in major elections
- Registered voters by geographic area
- Political breakdown of your town (by district?)
- Comparing to public tax rolls to find unregistered voters (or in NYS, second home owners who could register in your county)
In a future article I will cover some of those ideas and how to interpret the data included.