What’s up with all of those identical comments on the FCC net neutrality docket?

I don’t want to bury the good stuff, so click here for all 128,946 identical comments in a GitHub repository.

[5/16] I guess this is a series now. See part 2 for more data: https://medium.com/@nhf/the-fcc-is-still-getting-a-lot-of-automated-net-neutrality-comments-c7dff56d2563

——————

Ars Technica says:

The FCC this week has received hundreds of thousands of new comments on its proposal to dismantle net neutrality rules, and more than 128,000 of them are identical comments calling for the reversal of the Obama administration’s “power grab.” It seems likely that the influx of anti-net neutrality identical comments is coming from a bot, but the FCC hasn’t addressed the matter publicly yet.

The FCC’s electronic comment filing system (ECFS) has been going up and down like a rollercoaster ever since John Oliver encouraged viewers to file comments in support of net neutrality. However, if it is up, you can verify this statement for yourself:

Yes, those are all the same.

The words themselves may have come from a conservative think tank, but the group denies filing comments under the names of others. At this point, I was itching to find out more about this. Where did all the fake identities come from? Some of them have email addresses; all of them have postal addresses. Was the phrasing identical in all of the docket filings? Could there be another explanation?

Luckily, the ECFS has an API (thanks, Obama!) that I could use to search the docket and grab all filings that matched the boilerplate. Here’s the code for doing it. (You’ll need your own API key, which is free.)


Let’s answer some questions

Are all the filings identical?

Pretty much. There are 30 unique filings that contain the boilerplate within a larger text (most of them were, not surprisingly, complaining about spamming of the boilerplate). The rest are substantively the same. If you want to get technical, there are 128830 with the boilerplate and 120 with the boilerplate with added newlines.

Was the bot dumb enough to file thousands of comments in alphabetical order by first name?

Yes.

A substantial chunk were filed alphabetically by first name on May 9th within milliseconds of each other. They even sorted out uncapitalized names from capitalized names!

This consists of roughly 53,000 filings (40%).

Were there multiple “waves” of filings?

Yes. I can eyeball about 8, but I’m not sure how distinct they are. If you look at the CSV file on GitHub, you can see that the “alphabetical by first name” filings stretch from 2017–05–09 5:32PM up to about 04:00 AM the next day. A very non-scientific histogram over time gives you a sense.

Sorry about the lack of axes. My software is horrible.

Are these real people?

Most are, almost certainly. However, it’s also pretty certain that most didn’t send comments (ZDNet corroborates by contacting a few of the people). The question becomes: where did the data come from? The short answer is “I don’t know”, but here are a few thoughts.

  • Commercial or public marketing lists are possible: All the addresses seem to be real. A quick Google of some names and addresses seems to agree. For example, someone with a listed university address seems to go to that university.
  • Black market lists of identities: Checking a small sample of emails from the filings on https://haveibeenpwned.com/ shows about 90% are on a list of some sort. Maybe this is just a high base rate, though.
  • Preprocessed, bought list: Almost certainly. All of the addresses are nice and clean, all the email addresses are uppercased, and the filer names seem to be all formatted properly (with a few notable cases that could be data entry errors). In addition, all street addresses seem to be standardized according to USPS convention (Road -> Rd, etc).

How do I make my own FCC spamming bot?

It seems to be pretty easy, seeing as there’s an API call for that. A simple Python script or curl command could send off your own thousands of filings. [updated: 5/20] There are no publicly stated authentication or rate limiting provisions whatsoever besides the API key requirement. (Tweets from the FCC’s CIO indicate there may be internally-determined limits which are not disclosed to developers.)

Go forth and create your own bot! It’s that easy.

What should we make of this?

A net neutrality opponent, or someone working for them, got their hands on a list of people. They took advantage of the FCC’s no-friction comment filing API to blast hundreds of thousands of comments over the last few days. Almost all of them were automated and used false identities. They were sloppy about it, but it worked anyways.

I’d be interested in tracking down who did the bot spamming and where they sourced their identities from, but I have no idea where to start with that unfortunately.

Now what?

Go file your own comment or get informed about the debate. Play with the data if you want.

Bonus content!

Location of all the ZIP codes of supposed filers.