#4: What Twitter Can Learn from Overwatch

Twitter is a social network that people love to fight on. Overwatch is a fighting game that people like to be social on. Both products have historically struggled with abuse on their system, but earlier this year Overwatch launched a suite of anti-harassment features that I think Twitter could learn a lot from. I used to work at Twitter, and I helped design some of their own anti-harassment features and processes. So I tried out the new Overwatch features not just as a fan of Overwatch, but as a student of these kinds of anti-harassment patterns. They did a really great job. Twitter could learn a lot from Overwatch.

Twitter’s Core Design Pattern Is Why It Struggles With Abuse

To understand why Twitter struggles with abuse, we need to explain some of the decisions that make up the core of the Twitter experience. First, Twitter is public by default, which means anyone to say anything to anyone. Want to open a new account and send a death threat to someone? Knock yourself out, nothing’s stopping you! If your tweet is against policy (which death threats are), and someone reports the tweet, it will be taken down and your account might be blocked. It might take a while, though. Reporting is slow.

But if it’s not reported, or it’s a policy grey area, the tweet stays up. And even if an account is blocked or suspended, there’s nothing stopping people from setting up alternate accounts. And so they do. A lot. @obamaisalizardperson might get blocked, but then @obamaisalizardperson1 appears. Then @obamaisalizardperson2. It never ends. The end result is that Twitter has an extremely leaky system and no way to decisively stop someone from being a pest on the platform. Anyone can get on, say anything, cause a lot of damage before the reporting system catches them (if it does at all), then return and do it again. Now imagine what happens when they pull together 1000 of their buddies to attack someone at once. (known as “dogpiling”)

It gets pretty dark in a hurry. And that’s before you factor in grudges and enemies. It’s one thing being harassed by a random weirdo that immediately gets banned from the system. It’s an entirely other issue when enormous armies mobilise against each other and fight each other all day every day, each believing they’re making the world better each time they attack a person on the other side. Twitter has a well deserved reputation for toxic debate.

The Core Design Pattern in Online Gaming

Online gaming also has a toxic reputation, but it does benefit from being more impromptu. Let’s say you’re playing a military-style game with 5 other people. Let’s say you’re a medic, and you don’t heal your teammates quickly enough, so one of them starts yelling at you through their microphone. It’s not fun to be yelled at, but when the game is over, you don’t need to play with them again. And if you do block them, they’re not going to set up an entirely new Playstation or XBox account to get around the ban and keep harassing you.

And even if they’re really mad at you, and decide to launch an attack from 1000 of their friends, there’s no easy way for all 1000 attackers to harass your gameplay the way they can on Twitter. Twitter is an open mailbox that others can choose to stuff with angry mail. Online gaming uses a matchmaking pattern, where random people are collected together to play a game. The ability to “stuff the mailbox” a la Twitter isn’t possible in the same way.

Some Common Problems in Online Gaming

I’m not claiming gaming culture is squeaky clean, of course. People screaming into their microphone at each other, lodging fake reports, sending mean, disrespectful, racist, sexist, edgelord texts, and purposely losing games to “trigger” others are all very common. But while Twitter and Overwatch both have harassment, online gaming has more of an ability to sort the good from the bad and assign accountability. Which is what makes Overwatch’s features exciting. To understand why, you need to understand how Overwatch is structured.

Understanding Overwatch Abuse

Overwatch is all about teamwork. Imagine six superheroes flying around, each with their own superpowers and strengths. Many of the strengths work with each other. For example, one hero can pull enemies towards each other like magnets. Another one can cause a giant explosion. Put them together, as shown here, and you can pull enemies together and cause a lot of damage at once.

But this requires working together as a team, which is really hard. Imagine you have six people using microphones all yelling at each other. “Need help over here!” “Dude, who is our healer? We’re getting destroyed over here!” “Hey everyone, we need to drop back.” “I’m diving in, who’s with me!” “I died! Damn it, where is our healer?!” “Guys I can heal you if you wait for me to reach you without jumping in.” It can be chaotic.

And that’s before people start really getting mad at each other, purposely losing the game as a way to get back at a teammate they’re mad at, and so forth. The other day I handed my controller to my son for a minute so I could do a quick chore in another room. When I came back, he was in tears because someone had yelled at him cruelly for messing something up. When I got back on the microphone, I yelled at him, he yelled at me, and it was toxic for everyone. I reported him for verbal harassment. He probably reported me as well, because tit-for-tat reporting happens all the time.

Not that fights are super common. In my experience it’s maybe 5–10% of the time. 80% of the time most people aren’t saying much — no fights, but also no great teamwork. But sometimes you get into a really fantastic game. You’ve got a leader on the team that’s really good at herding cats and organising groups into a single plan — (“Ok everyone, we’re going to be attacking on the right side. Can we make sure to have a healer? Perfect, thanks. And is anyone good with Orissa or Rein? We’ll need a shield.”) — and everyone else is good at playing their roles. Those games are fantastic. So the Overwatch team decided to launch something called “endorsements,” where you can reward good behaviour.

Overwatch Endorsements

Ever since this feature launched, you’ve had the ability to nominate other players for one of three endorsements, and they can nominate you back. Here are the endorsements:

* Shot Caller (Leadership)
* Good Teammate 
* Sportsmanship

The more endorsements you get, the higher your endorsement level. And people can see a little colour-coded infographic that shows what kind of player you are. For example, “Shot Caller” is hard to get, since most people aren’t great at executing a strategy across an entire team over a microphone. So when you see someone with a lot of orange, you know they’re going to be a natural leader.

This is the kind of design that’s obvious in hindsight. Every game should have something like this! The results are predictable and wonderful: this system matches like-minded people together. People that value sportsmanship are paired with others who do as well. Jerks and trolls get grouped together because no one wants to play with them. Brilliant. When the feature went live, I immediately saw an improvement in the quality of the interactions I was having. It was magical. It’s an example of how great design can help fight abuse and make experiences better for everyone.

It got me thinking: what if Twitter had something like this? Are the design patterns similar enough to make something like this work, or is it an apples to oranges comparison that makes no sense? Let’s dive in and find out.

If Twitter Had Endorsements

The first mismatch is how Overwatch is oriented around single games, whereas Twitter is a 24/7 chatfest. So what would the bounding box be for endorsements? You can’t base it around a game, or a fight, or a favourite person on Twitter. Otherwise you’d just keep endorsing your own friends, which adds no value to the system, or to others, or to you. And you can’t make it open-ended. If you can endorse every single tweet you see, the economics of it would be out of whack. You need scarcity to make it matter.

So what if you could endorse a single tweet a day? Interesting thought. Let’s think through the implications of a move like that and see where we end up.

Protecting Against Sockpuppets

One of the joys and fatal flaws of Twitter is how you can make as many accounts as you want. So in a system where each account gets a single endorsement, and the endorsement is considered valuable, it’s not hard to see what will happen: people will make new accounts to get more endorsements. Fake accounts like these are common, and often called “sockpuppets.” How might we protect against that? Simple: you have to distinguish between new accounts and old accounts, and treat them differently. “One size fits all” is one of the biggest blunders in softeware design. Context matters, and it matters twice as much with issues like trust and harassment.

The easiest thing to do is make the feature available only to accounts that have been around long enough, of course. For example, you could award endorsements to any account older than a week, or a month, or a year. But I’d have two concerns with that. First, it’s easy to game. People would simply build the sockpuppet version of sleeper cells, where you register a bunch of accounts, let them sit dormant, then unleash them all when they hit the time threshold. You’d slow down the sock puppet armies, but they’d easily find ways around it.

But the second concern is the real issue: an account’s age is not a good indicator of quality. Quality should lean on a lot of factors, the more the better. How often you post in an average day, how often you log on, how often you report people (successfully) versus how often people report you (how often is it successful and how often is it done in a retaliatory way), how often are you using the same devices, have you verified your email and/or phone number, how often do you tweet at others, and when you do, how often is it borderline or against policy behaviour? The list of factors can and should be extremely deep, because the more factors the model is considering, the more nuance and context it can have.

So let’s say you put together a time and quality based model. You say anyone that’s been on the site for over a year and is “in good standing” gets to endorse others. That’s enough of a high bar to make endorsements actually mean something, protect against fake accounts, and drive good behaviour at the same time.

The Naive Wing of the Techno-Libertarian Bullshit Party

Talk like this concerns people. Technologists often default to a moral version of MVP products: “let’s just put it out there and we’ll hope for the best.” Talking about driving good behaviour gives people the heebie jeebies because of unintended consequences. I think these concerns are valid, but that doesn’t mean Twitter’s opposite tack was a good thing for the world.

Early Twitter’s guidance was pretty loose. It infamously had a line about being the free speech wing of the free speech party, and believed that the good stuff will always rise to the top. History has disproven this approach many times, but we seem to be having to re-learn it a lot in technology products.

When you have no laws, you have warlords. When you ask everyone to work it out amongst themselves, you’re essentially saying “I’m sure the people with the guns will do a good job looking out for everyone.” It happens in lawless territories, and it happened on Twitter. It’s a disaster in real life, it was a disaster on Twitter.

Sorting by faves and trending tweets doesn’t show the “best,” it just shows “popular” which is an easy metric to manipulate. But if Twitter were to move away from “popular” and instead towards “healthy” or “high quality,” then it can open up an entirely new way of ranking and rewarding data on Twitter’s platform. (I should note that Twitter has, in fact, gone down the route of “Health” in the past few years, a move I think is overdue and very smart.)

Educating Users About Tweet Health

But what does “healthy” even mean? It’s a giant topic and there a lot of ways to mess it up. But I’m a believer in reasonable defaults giving subtle cues to everyone in the system. When you report a Tweet, it asks how the tweet broke policy, and it provides you a multiple-choice selection. This form doesn’t just act as a sorting mechanism, it also acts as an educational tool. It says “these are the things we don’t like on our system. Help us find and remove it.”

I could see the same thing on the endorsements side. It’d be a positive spin on the same idea. Let’s say I go to endorse a tweet and it asks me why the tweet deserves an endorsement, with these options:

* Entertaining
* Thoughtful
* Helpful
* Informative
* Good sportsmanship (disagreeing without being disagreeable)

I’m not claiming these back of the envelope ideas the best possible options, but they can demonstrate how your default options can impact the whole system for the better. A user would see those options and gain a little bit of motivation to be a better sport next time I’m arguing with someone. Or to try to be helpful. Today the only motivation I have is to speak as loudly as possible to my own echo chamber so I can get likes, retweets, and followers. Broadening the incentives could help people communicate in a broader and more beneficial way.

A Richer Profile Page

Most people will never endorse anyone, and most people won’t get many endorsements. But, again, that scarcity can be powerful. If only some people are using endorsements, the scarcity makes them stand out. And everyone likes things that help them stand out on Twitter.

Today, you check someone’s profile and you see how many people follow them. Or you can see what they’ve favourited. It’s not particularly interesting. I don’t learn a lot. The profile page feels pretty flat. It could show a lot more.

What if someone’s profile showed their tweets that have garnered the most favourites? Ok, now we’re getting somewhere. What if it also showed Tweets that others have endorsed? Same idea, but even more powerful. What if each endorsement was categorised? So person X might have 90% “Entertaining” and 10% “Informative.” That’s a very different profile to person Y at 50% “Thoughtful” and “50% Good sportsmanship.” Person X is more of an entertainer. Person Y is more of a keyboard warrior. (A polite one, by the looks of it.) These are the kinds of signals that give much more value, not just to people browsing profiles, but to the whole platform.

What if the profile combined those ideas with a simple pie chart that shows how often someone tweets vs retweets, how often there are links to other sources, and what those favourite sources are? It would give people a sense of the “ingredients list” for each person they’re choosing to absorb into their timeline, like you would with foods. Some accounts are healthier than others, so Twitter could make an effort to show healthier options.

The profile could show heaps of other things: it could show how often you post historically, or over the last 30 days, and the last week. It could show the number of times you’ve been successfully reported, and what action was taken. It could show the most common words, hashtags, and media sources you link to. It could show people you frequently @mention. (And all of this would be optional — see the section on private profiles below)

Look For the Helpers; Start By Defining the Role

You could also tie profiles to incentives Twitter wants to see more of, with visual representations. For example, if a person uses Twitter once a day, their profile could show off a little badge that proudly advertised that fact, much like the endorsements badge. Once you have a system like this, you can add all kinds of things to incentivise all sorts of behaviour, like a badge for people who have verified their email or phone number. You could also have a badge for people who take time to be strong community watch figures. Make it prestigious to help Twitter with its health problem and people will sign up for it. And as more people sign up to help, the community gets stronger.

Using Private Profiles As Another Helpful Signal

But what about the privacy implications? What about people that don’t want to share these details? They should have the right to privacy, of course. By default, profiles could be pretty similar to the way they are today. But you could take action to share more with people. Overwatch did this too, in fact.

When you were waiting to play an Overwatch game with others in the past, everyone could see which characters you played the most frequently, what your win/loss rate was, and a variety of other data points. They decided to make this data private, which opened up a new angle to the game: what do you show, and to whom? Someone that shows their profile is considered to be more trustworthy. Perhaps with an algorithm tweak to reward the people that are more of an open book, under the assumption that sock puppets aren’t as motivated to share those kinds of details.

Per Capita Scoring

Raw numbers only say so much. If someone has 25 million followers and they get a million likes (1/25), that’s a lot less impressive than someone with one hundred followers that can score ten thousand likes (100/1). The same would apply to endorsements: celebrities are going to pull down a lot of endorsements. But if there’s a score that shows the quality of a profile, a boring celebrity should show lower than a really entertaining non-celebrity. Per capita scoring would allow that, and reward more valuable content than Twitter has today.

Improved Search

Wouldn’t it be interesting if you could search Twitter for interesting accounts to follow that weren’t so directly influenced by follower counts? For example, Twitter knows what kinds of articles I tend to click. It also knows what accounts tend to link to that sort of content. It’d be great to be recommended more accounts to follow that match the sorts of things I’ve seen in the past. Even if they only have 14 followers! With endorsements as a signal, and a move past popularity as the only metric, search could evolve to be much better at this.

Hoaxes and Disinformation

By adding in new signals, it’d be easier to discover and curtail disinformation. What if we went really aggressive with it? What if you wouldn’t see any retweet if the originating and amplifying accounts had profiles that weren’t set to public? Let me say that again in simpler terms: what if retweeting was a privilege, not a right? What if your tweets traveled further based on how healthy your account is?

Reasonable people could disagree on which features should be possible in what scenarios, but the overall idea that you’d need to earn the right to harness powerful features is an interesting one. We’re used to making software that gives every user the same power all at once. What if we proposed that tweeting is powerful, and potentially disastrous in the wrong hands? What if Twitter made designs that put that fact front and centre?

A change like this would mean that long term users of the site, with a balanced blend of content, who are highly engaged with the service, get rewarded because their retweets can travel further and faster. On the other end of the spectrum, imagine a fake account posting fake information. Say 100 people retweet it (whether as part of a coordinated disinformation campaign or not). And let’s say those 100 people have a potential reach of 50,000 people.

With this new system, let’s say the fake tweeter wasn’t set to “trusted.” That would mean none of the 100 accounts retweets would travel very far. The amplifying account would have the ability to retweet, but anyone following those accounts would have to actively request “untrusted retweets” to be turned on. And they wouldn’t be by default. As a result, the information would travel … but not nearly with the same spread as today.

Not All Content Is Equally Important

There’s another assumption baked into the Twitter system that all content is equally important, all retweets are equally important, and if you follow someone, you always want to see everything they’ve retweeted. You don’t. There’s also an assumption that retweets are universally awesome. They’re not. The design community embraced responsive design on the web, but it’d be good for social companies to factor in the same context when showing content.

Right now Twitter doesn’t allow you to turn off all retweets. That’s a shame. It would be nice to add that feature, but I wouldn’t want to stop there. Maybe there can be a way to explain what sorts of retweets I want to see. Maybe there’s a quality slider that only shows me retweeted content that’s seen as high quality. Maybe retweets shouldn’t drop straight into my timeline, but instead should make up a separate timeline, one that’s closer to a news magazine. Maybe Retweets could power my News/Moments experience.

So where would these changes leave us? The ability to let people earn their audience. High quality engagement results in higher reach. Low quality engagements mean less reach. Increased transparency on both sides allow for even more signal around what makes an account good. Lots more visibility into how the algorithm works. A much harder time spamming people with low quality content with the old, stale metric of “lots of activity must mean it’s good.” A more solid system, one that’s harder to game. A system that incentivises the things that people in society want more of. Truth. Facts. Fairness. Context.

The Fear of Becoming “Arbiters of Truth”

A Silicon Valley mantra is that tech companies don’t want to become “arbiters of truth.” It’s said so frequently, and with such conviction, that it’s as well-understood and agreed with as saying the sky is blue. And I’m not coming at this issue naively. I worked on teams where this was a big part of our mission, both on the News team at Twitter as well as the team working to fight harassment and abuse. So this isn’t a throwaway shower thought. It’s something I spent years thinking about. I think companies can help with truth, they just have to think about it a little differently.

Let’s start by talking about the nuance. Some news topics have more nuance than others. Those people protesting outside the presidential palace — are they freedom fighters or terrorists? Donald Trump is president — is that good or bad? Even something like evolution or man made climate change is harder to pin down, despite overwhelming scientific evidence. So I can understand someone saying they don’t want to use a robot to algorithmically determine what complex and nuanced topics are “true” or “false.” I totally get that. We can set that aside as an idea. That’s what tech means when they resist being an “arbiter of truth.”

But there’s an opportunity for being a facilitator for truth. It would help, people want it, and it could be done without putting your finger on the scale.

Let’s get back to the core human need for fact checking: people want context. They want to understand issues. They want to know when something is plainly and provably true. I stand up and say “America is celebrating its 500th anniversary,” that’s provably untrue. And there are a bunch of people who would appreciate knowing that context in real time instead of having to guess, or do my own research, or just blindly accept a lot. So how might Twitter serve those people in those scenarios?

First, we have to understand that the sheer number of tweets means that it’s not possible to have a human review every single one. Second, we have to understand that people typically want things quickly. A Twitter that’s 48 hours behind because human editors are asked to fact check everything is not what most people signed up for, at least not as Twitter currently explains itself. So Twitter doesn’t have the manpower, resources, or speed to do fact checking in real time.

And even if Twitter could miraculously do it, no one is asking them to. And even if people did ask, how can people confirm that they trust the source? No one is going to trust a shadowy black box in the background that’s possibly censoring great content. So how might we handle this? Here’s an idea: the ability for people to opt-in to fact checker partners. Allow users to pair tweets with fact checking services. That’s not deciding what truth is, that’s just giving people tools when they request it.

Let’s say you trust Politifact.org. Politifact is already investigating the claims of powerful politicians and media figures in real time. They already have a ranking system of “fact” to “lie,” with shades of grey in between. Wouldn’t it be interesting if scrolling through your timeline wasn’t just a firehose of content, but instead a firehose of content with context provided by Politifact? What if when you saw a tweet that’s an obvious lie, Politifact was there to provide context? That could be interesting.

Purposeful Time Delays

But how would Twitter pull off this magic trick? If Trump posts something untrue at 4:01pm, and you check Twitter at 4:02pm, there are a few approaches to consider:

1. Show the untruth at 4:02pm (today’s approach)
2. Hope that Politifact can provide context within 1 minute (impossible)
3. Put the untruth into a holding pattern until it can be confirmed

But let’s challenge our assumptions here. Twitter is a real-time service as of today. But consider: would it really be so bad to have some tweets on a time delay? Would you die if you had to wait 3 hours to see a tweet with context, when the alternative is absorbing lies in real time?

What if Twitter could prove that the delay in data could give you much better information? What if there was an easy way to see the raw data, so you know you’re not missing anything, but know that the context will be high quality? That could be interesting. That could be valuable. And it would short circuit the disinformation warriors, because it would make it much harder to get disinformation out to people without it being challenged.

Who Decides the Truth?

Imagine you didn’t just follow Politifact. What if you could also opt into other fact check sources. What if anyone could be a fact checker? What if there was an entirely other role that Twitter customers could fulfill? What if you were allowed to tag tweets with metadata? Words like “opinion” and “inaccurate” and “analysis,” in aggregate, would provide context. But what about troll armies? Wouldn’t they invade and try to shift public opinion? They’d try. But what if tagging metadata, like other advanced features, weren’t awarded right away but instead took training and a high health rating?

What if tags are only seen by trusted accounts? So if all of my friends tag something as inaccurate, I’ll see that. But another group of people might tag something as accurate, so they can see that. It could be helpful to see that 96% of my trusted accounts call something accurate. That could help know what to present to me. It would provide more data points for Twitter.

And what if there was an audit process for anyone tagging data? What if openly lying was an offence that would take your tagging rights? For example:

Tweet: Climate change is real, warn scientists
Metadata: inaccurate
Reported for: inaccurate tagging

Wouldn’t this result in silos and information bubbles? Sort of, but it would be better than what we’re used to today. Today, I just follow the people that say the things I want to hear more of. But it’s not just because people hate to see other viewpoints. It’s also because of trust. When everything shot at your by a high speed cannon is untrustworthy, you find ways to sort through the noise. But Twitter could help sort through the noise. It just has to build the features that will enable the behaviours they want to see.

Summary

People play to the rules of the game as you define them. When they’re motivated to lie as frequently and as entertainingly as possible, that’s what they’ll do, and that’s where Twitter is today. When they’re motivated to be a great team player on Overwatch, that’s what they’ll do. Twitter has a great opportunity to add some new incentives that will lead to new behaviours. We’ll never fix abuse completely, of course. But the key is to keep trying. Without resting on old patterns and old assumptions. Here’s hoping Twitter gets inspired by some of the great progress the Overwatch team has made.