Bot or Not?: Verifying public comments on net neutrality
Our analysis indicates that of the 22 million comments submitted to the FCC regarding net neutrality, 80% were likely bot, 9% were verifiably legitimate, and 11% were difficult to verify using our methods. Of the 9% of comments that were verifiably legitimate, 66% were pro-net-neutrality (anti-repeal) and 34% were anti-net-neutrality (pro-repeal). 98% of unique comments favored net-neutrality.
As the FCC steams ahead with its plan to dismantle net neutrality rules, many of us in the tech community have taken a keen interest in the public comments submitted to the FCC this past summer. The FCC requested public comments on their proposed repeal of net neutrality rules, “Restoring Internet Freedom,” starting in April 2017, and millions of comments flooded in almost immediately.
Less than a month after the comment period opened, researchers noticed something strange. Millions of comments were being submitted with the same text. It seemed unlikely that each of these comments had been submitted by an individual person. In fact, several people denied submitting comments attributed to their name and address. The sheer number of comments submitted to the FCC was also suspicious — a whopping 22 million by the end of August, which is more people than the populations of Los Angeles, Austin, NYC, Chicago, and San Francisco combined. It seemed likely that many, if not the majority, of comments were spam.
Inspired by researcher and Python developer Chris Sinchok’s Medium post, An Analysis of Anti-Title II Bots, we analyzed all 22 million comments submitted to the FCC during the comment period. Our goal was to determine which comments were likely sent by bots, which were likely sent by humans, and what the true sentiment of the country is on the issue of net neutrality.
This analysis was complicated because we couldn’t just claim comments with the same/similar text (a template) were spam. Activist groups representing pro- and anti-net-neutrality positions were running legitimate digital media campaigns like the July 12 Day of Action for Net Neutrality, to accrue as many public comments as possible. Real people signed and submitted these pre-written comments, believing they counted. It would be unfair to discount them. It would also be incorrect — the FCC provided a completely legitimate, albeit flawed, way for organizations to submit comments in bulk via their API.
TL;DR: most human commenters support net neutrality.
Using clustering and information retrieval techniques in an Elasticsearch instance, we identified 42 different templates in total, accounting for about 21 million comments, or about 95% of all comments. Could these 21 million comments all be spam?
Just looking at the templates themselves wasn’t enough to determine how a comment was submitted to the FCC. Signing petitions and submitting form comments are a common calls-to-action in digital activism. We also needed to look at who sent them. As noted by journalists early in the comment period, people’s names and addresses were being used to submit these comments without their knowledge.
Initially, we looked for email addresses that were obviously fake, or multiple comments submitted from the same email address. Based on this analysis, we estimate that 91% of all anti-net-neutrality submissions, and 79% of all pro-net-neutrality submissions, came from bots.
However, the question of whether comments were being submitted without the purported commenter’s knowledge remained. The only way to truly determine whether a comment was spam was to ask the person who supposedly submitted it. So that’s what we did.
We sent more than 8000 emails to purported authors of template-based comments (a random sample of 200 email addresses per template, or all of the email addresses for templates with fewer than 200 comments). We also emailed 200 people who had submitted comments that didn’t match any known templates.
We asked the purported commenters a simple question: Did you submit this comment to the FCC?
The results were striking. People whose email addresses were associated with pro-net-neutrality comments confirmed, at high rates, that they submitted those comments. But people whose email addresses were associated with anti-net-neutrality comments denied ever having submitted those comments.
For the five anti-net-neutrality templates with the most total comments sent to the FCC, at least 80% of those who replied to our email denied submitting the comment. For two templates, everyone who responded denied having sent the comment. Conversely, for the top five pro-net-neutrality templates, between 89–100% of responses affirmed submitting the comment.
Based on our analysis, across all templates and unique responses, over 70% of pro-net-neutrality respondents confirmed that they submitted comments. Under 10% of anti-net-neutrality respondents confirmed that they submitted comments.¹
In total, we identified 8.6 million anti-net-neutrality comments from templates and 12.3 million pro-net-neutrality comments from templates. We eliminated comments from obviously fake and duplicate email addresses and then scaled the remaining comments by the percent of pro- or anti-net-neutrality comments who confirmed sending the comments. Finally, we analyzed the 650,000 non-template comments attributed to unique email addresses to determine which were pro- or anti-net-neutrality. We found that 90% were pro-net-neutrality, 2% were anti-net-neutrality, and the remaining comments were null or blank. (Incidentally our result independently confirms Jeff Kao’s result reported here). In total, we conclude that the FCC received just over 600,000 legitimate anti-net-neutrality comments and 1.3 million legitimate pro-net-neutrality comments.
It should be noted that our email survey was not able to reach 2.3 million commenters who submitted to the FCC without email addresses. It is entirely likely that some of these comments are legitimate (seeing as the FCC did not require an email address to submit a comment). But the effort required to reach out to these people personally was beyond the scope of this analysis.
However, the vast majority of these comments, about 2.1 million, were pro-net-neutrality. This means our estimate, that 66% of commenters favor net- neutrality is low. If even half of all comments without email addresses were verified as legitimate, our estimate jumps to 77% of commenters favoring net-neutrality.
- The FCC received just over 600,000 legitimate anti-net-neutrality comments and 1.3 million legitimate pro-net-neutrality comments (34% vs 66%).
- The vast majority of comments submitted to the FCC regarding net neutrality were from bots. We estimate 91% of all anti-net-neutrality submissions were from bots, as opposed to 79% of pro-net-neutrality comments.
- After eliminating comments attributed to fake and duplicate email addresses, over 70% of pro-net-neutrality respondents said they submitted comments. Under 10% of anti-net-neutrality respondents said they submitted comments.
The source code for our work is found at https://github.com/RagtagOpen/fccforensics and is based on the work at https://github.com/csinchok/fcc-comment-analysis. If you want more information about our analysis or methods, please contact FCC.Comments.Research@gmail.com.
For additional analysis, please see Jeff Kao’s excellent work at https://hackernoon.com/more-than-a-million-pro-repeal-net-neutrality-comments-were-likely-faked-e9f0e3ed36a6.
¹Of course, as in any survey, we did not have a 100% response rate. About 29% of the emails we sent bounced (almost identical between pro- and anti- comments), and about 46% of emails were unopened (again almost identical between pro- and anti- comments). Twenty-two percent of emails were opened by the respondent, as reported by Mandrill’s email tracking technology.
In order to better estimate the overall sentiment of human commenters, we filtered out comments attributed to fake email addresses, as well as the duplicate comments attributed to email addresses that were seen more than once in the full dataset.
After removing fake email addresses and duplicates, we did see a bias in response rate between pro- and anti-net-neutrality commenters. Pro-net-neutrality commenters were far more likely to respond than anti-net-neutrality commenters. Because pro-net-neutrality commenters were also the majority of affirmative responses, it’s unclear whether the bias in our study lies between pro/anti commenters or yes/no responders.