Analyzing 20 Million FCC Net Neutrality Comments

Matt Miller
Sep 22, 2017 · 5 min read

The FCC closed their request for comments on Docket 17–108 “Restoring Internet Freedom” a few weeks ago. This proposal is basically the the current administration’s attempt to end Title 2 protection of the internet as a common carrier utility and will likely lead to diminished net neutrality. With over 20 Million comments summited I was curious if there were any patterns in such a large dataset. Particularly if support for the repeal could be seen geographically (commenters can include their physical address). For example, are pro-Title 2 repeal comments coming from traditionally Republican areas of the country?

Image for post
Image for post
The vast majority of comments came from a form or bot submission. A organic comment here is one that is repeated less than 100 times in the corpus

To start with, these comments are a mess. The FCC allows bulk and API comment submissions, and that is what they got. The vast majority of the comments came from a bot or a form submission. I’m basing this on the comment text itself. Only a little over 1 million comments were textually unique. Meaning written out by a real live thinking human being. The rest of them are just the same comment submitted with a different name and address attached to it. In fact, over 50% of the total comments are a comment who’s text is repeated over a million times.

Image for post
Image for post
Almost all of the comments submitted were comments who’s text were duplicated thousands, to millions of times.

I really did not want to get into the validation game, trying to figure out if a comment is pro-repeal or against is difficult enough. So I just took the data as is, with one exception. There were over 7.5 million comments with the exact same text: “I am in favor of strong net neutrality under Title II of the Telecommunications Act.” They all had fake emails address (‘jourrapide.com’, ’einrot.com’, ’armyspy.com’, ’fleckens.hu’, ’cuvox.de’, ’rhyta.com’, ’dayrep.com’, ’gustr.com’, ’superrito.com’, ’teleworm.us’) and had fake physical addresses, all non-existent street address. There were all submitted in huge batches at the exact same time over a couple weeks. They were painfully fake, so I removed them, also removing my confidence I would get anything interesting from this data.

I needed to classify the comments into pro and anti repeal. This would be difficult except, there were literally no pro-repeal comments that were not from a bot or form submission. Out of the million unique comments (only occurring once in the entire corpus) I could not find a single pro-Title 2 repeal comment that looked like it was written by a person. The pro-repeal folks had their form/bot game on point, submitting millions of comments that said the same thing but had a real person’s credentials attached to it. I compiled some examples these comments. My favorite, submitted only one hundred thousand times by different people:

The anti-repeal folks also had a some bot/form submissions, but far less, and the difference is that these were not the only type of comments submitted. Reading some of the 1 million unique anti-repeal comments written by teachers, small business owners, librarians, students and others was the best aspect of this project.

Fully acknowledging this is a problematic dataset I looked at my original question of geography. Out of all the data I was able to pull out about 7.5M pro-repeal and 4M anti-repeal comments that had a valid US zip code. Orange is pro-repeal, blue is anti-repeal:

Image for post
Image for post
Density of pro-repeal (orange) and anti-repeal (blue) comments by zip code

The more comments that originated from that zip code the darker it is. You’ll notice around metropolitan areas, for example New York, there is more anti-repeal blue filling in. Otherwise the maps are fairly similar, they are densest around population centers.

I also looked at the state level, for comments that had valid state codes:

Image for post
Image for post

Likewise I looked at email providers, to see if there might be anything interesting there:

Image for post
Image for post

For both nothing really intriguing jumps out, a lot of people use Gmail and live in California.

As I mentioned earlier the best comments are those unique one offs written by real people. There are some written by librarians, and some that got pretty creative with emojis. You can download all of these anti-repeal unique anonymized comments here(70Mb). And you can download all the classified comments here(100Mb).

It seems likely that regardless of these public comments Title 2 protection will be removed by this administration. But it is not over yet, hopefully net neutrality will prevail and we can keep Chet happy:

Image for post
Image for post

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store