Paul Walker
3 min readOct 9, 2018

Bird Dog Nation is an activist group committed to direct democracy and the furthering of progressive missions. Their approach is direct and effective: they visit politicians when they appear in public, and hassle them about their positions on the issues. It keeps issues in the politicians’ minds, encourages them to make statements as to their views, and reminds them that their constituents are everywhere.

As the midterm elections approach and the political stakes grow, the number of candidate events and campaign events grows along with them. Bird Dog Nation has built a grassroots network capable of mobilizing quickly when candidates surface.

The organization is volunteer driven, and resources are limited. While people are available and willing to show up to do the impactful work of harrying our elected officials, there are often thousands of events to sort through, and it isn’t a negligible task to separate the wheat from the chaff, as it were. Even after campaign events have been collated it can take several volunteers the better part of a day to identify which events are worth adding to the calendar.

I reached out to Bird Dog Nation to offer technical help in making this process more efficient. Working with Eric Gosh from Bird Dog Nation, I helped to figure out a basic algorithm for sorting the events. Following is a brief but technical description of the process. If you would like to know more, feel free to contact me.

First we select events containing certain keywords (“join”, for instance). These events are classified as “candidate-present”. Other keywords are used to classify events as “no-candidate”. This is what I would call the deterministic, or heuristic phase of the algorithm — we are taking good human intelligence and applying it to a well-studied domain.

The second phase of the sort is probabilistic. We use past CSV data which has already been classified to train a machine learning model. After experimenting with different approaches, including neural networks, I settled on the Naive Bayes algorithm, a common tool in text sentiment analysis, which correlates observations (word counts and frequencies, among others) with probabilities. Once the deterministic sort has run, the classifier classifies the remaining records into one of the buckets.

Finally, we output a set of files, one for each bucket: “candidate-present” and “no-candidate”. After review, the events can be added the Bird Dog Nation calendar, and the data can be added to an ever-growing pool of training data, improving the Bayes model’s accuracy. Keywords from the deterministic phase can also be reviewed and updated in an iterative fashion, also improving overall results.

We have achieved up to 90% accuracy with the system and false positives and negatives can be caught when an administrator reviews the data. The process is efficient and saves limited human resources for more important tasks.

Given our initial success, I decided to launch a public Alpha version of the project, currently online at http://csv.parcelize.com. The system integrates with google docs, allowing users to input links to CSV formatted training data (note that the non-public csv links are redacted following images).

Step 1 — Provide categories (“buckets”) and links to training data

Once the machine learning model is trained, you can provide a link to the data you wish to classify, along with keywords which will be used to sort data before the model is run.

Step 2 — Provide a link to the data you wish to classify

The sorted data is then output as CSV files which you can download!

Step 3 — Download the sorted data!

The resulting system is minimal and performs one task well: categorizing spreadsheet rows based on text content.

While machine learning concepts (training, models, etc) are not entirely intuitive, I do believe that many companies, organizations, etc, could benefit greatly from similar workflows. If you would like to learn more visit http://csv.parcelize.com or leave a comment below.