The problem with Verified on Twitter

Last night, I got verified on Twitter. I have to give them props, because I didn’t expect them to give verified status to an activist that’s known for being very critical of their service. A Twitter rep has, in the past, told one of my friends that activists typically don’t get verified, so I submitted my request for verification just to see what would happen. I was also very curious about the tools available to verified users. I’ve talked to other verified users previously, but all the information I had came from other people and not my own experiences. Now, less than 18 hours later, I’ve got some thoughts to share.

Verified is in a weird space. It’s not just celebrities, but instead operates under some strange idea of who Twitter considers noteworthy. When I applied, I included a lot of URLs from the news, one from a tabloid that has caused massive amounts of abuse towards many people, as well as my wikipedia page & a semi-viral medium post where I roasted the bejeezus out of Twitter.

A lot of people that might be more well known than I am have been turned down after applying for verification. People that haven’t been in the media seem to be shit out of luck.

People have been talking about how verified status is a way to avoid abuse. This is unequivocally not true. While I think it’s an incredibly bad idea to ask users operating under a pseudonym to assume their real identity to avoid abuse, here I’m going to focus on tools versus a philosophical argument about anonymity.

While Twitter does add two new options for verified users, it removes one of the best tools for filtering abuse: the ability to filter your notifications by only showing people that you follow.

This:

Becomes this:

While I do think the verified interface is a bit cleaner, the removal of that option is very unfortunate. Instead, I’m given the ability to filter my notifications and view only other verified users. This is not incredibly useful for me, as I do a lot of outreach and tend to interact with non-verified users more.

Next, we’ve got Quality Filter. Where to even start with this.

To the best of my understanding, Quality Filter is for a very specific kind of abuse. It will not catch everything, and many people won’t see any improvement at all. To explain why this is the case, we need to talk about what abuse actually looks like. Again.

Not all abuse looks the same. Even mob harassment has distinctions. This becomes clear when you start examining metadata to create rulesets to filter abuse. (To be clear, this is not a thing that Twitter offers. This is a custom piece of code that I wrote that I run on my own network that cannot be publicly released due to Twitter’s Developer Rules.)

Abuse filters are a lot like anti-spam. They look for patterns in data. When I’m creating rules for filtering abuse in my own software, I look at a combination things like account date creation, if the profile pic is still the default, who the person interacts with, if that person interacts with people I’ve got blocked, who they follow, how many tweets they’ve sent, how many of their tweets are retweets versus original content, etc. It’s a huge list, and it creates a risk score. Any one or two or three of these things isn’t enough to get you caught by my anti-abuse filters, but a combination of many means I won’t have to see your tweets.

As I was building out this system, many things became clear.

While some mob harassment shares very distinct characteristics, this is generally limited to abuse that exists within communities on Twitter.

For example, GamerGate is one form of a community that had several influential members that would pick targets, at which point the rest of GamerGate would pile on. This is a simple pattern that was very easy to distinguish, and creating rules to filter this was simple. While ggautoblocker was very effective at filtering this type of mob harassment, if we examine this from a “creating rules for an anti-abuse filter” sort of perspective, we can determine the network to filter like this:

  • Examine incoming notifications over the past N hours
  • If notification is a reply to a tweet that is not my own, take note of the username that started that thread.
  • If many people are responding to that thread and I have not responded, this thread has a high probability of being part of mob harassment.
  • If any notifications are not replies to an existing tweet but those accounts are following or interacting with similar accounts — or, even more notable, the accounts mentioned in the previous rule — this person has a high probability of being part of mob harassment.
  • Look for commonalities in account metadata, such as account creation date. Were many of these accounts created at the same time? Do they have profile pics? Apply standard scoring mechanisms with a modifier for the above rules.

That’s all fairly obvious and easy to track, but, as previously stated, it’s a very specific kind of harassment. Most harassment is not that easy to define due to the lack of information made available to third parties.

A major part of harassment originates off-site. For example, there can be threads on reddit where people link a tweet. Many news websites include a link to the Twitter of the journalist. Ask any political journalist how their mentions have looked lately. It’s not so easy to find patterns in data when the source is off-site without access to referrer logs.

There’s also people that search for certain words or troll hashtags. These people aren’t necessarily part of a common network. Natural language processing doesn’t work well when you’re doing it 140 characters at a time in a place where terms are appropriated and sarcasm is common, so that’s out.

In short, after 2 years of working on this, I can tell you definitively that it’s really fucking hard to use code to determine if a tweet is harassment or not.

Quality Filter does not — and cannot — address all of these concerns. There is no public data as to how Quality Filter works, and for that matter, there probably shouldn’t be. If you give someone a list of the ways you filter content, they’ll just find ways around it. Obscurity is good in this particular instance.

I can hazard a guess as to how a lot of Quality Filter works, but I need more time with it to have conclusions good enough to write about with conviction. I suspect that it filters more on easy metadata — egg accounts, if a user is blocked by many other users, and basic NLP — perhaps with exceptions if there’s a commonality in networks.

However, I’m wary of using Quality Filter and have not enabled it for one simple reason. There is no way to view filtered tweets.

This may not seem like a big deal, but think about how email works. We’ve got a spam folder. It’s there for a reason: false positives exist and are even common. Anti-spam is an industry that has existed for a very long time, and they still don’t always get it right. When we’re talking about abuse, the ability to review filtered material is even more important, because if you’re under fire, you need to take the time to review those tweets to see if any require action, such as involving the police.

Furthermore, Twitter has an existing filter that is applied to all accounts, and you don’t have the option to disable it. However, it only appears to affect web, as I often see tweets using Twitter’s iPhone client that are not in my notifications using Twitter’s web client. There are many false positives. This is likely why the alt-right keeps blathering on about being shadowbanned.

If Twitter has such a high false positive rate with the existing filters that are automatically applied to all accounts, I’m hesitant to enable Quality Filter without having a way to review tweets. While I do tend to get a substantial amount of mob abuse thrown my way that would likely be caught by Quality Filter, frankly, I trust my code more than I trust Twitter.

For those people that are asking why Quality Filter isn’t available to everyone, I think the reasons above lay it out pretty well. I suspect it wouldn’t catch the majority of abuse that most people receive. Pushing a tool like this to the general public as a solution for abuse and having it not be that effective would be a terrible idea. “Here’s a solution! Sorry it doesn’t work for you,” is just going to turn more people off of using Twitter. It’s bad business.


Enjoyed that read? Click the ❤ below to recommend it to other interested readers!
A single golf clap? Or a long standing ovation?

By clapping more or less, you can signal to us which stories really stand out.