On Twitter’s “Health” Metric

And Thoughts on ‘Fixing’ Twitter

30 min readApr 16, 2018

Introduction

The reason I’m writing this is because Twitter published a Request For Proposal (RFP) in which they request proposals for a “health” metric. In my book, admitting you have a hard time solving a really hard problem, opening up to the wider community, and showing you’re listening to your users means taking ownership of your problems, so +1 for that. As far as I can tell, this makes Twitter the first major social media platform to take ownership of this problem. It’s the first step down a long and mainly unpaved road, and lots of pioneering will have to be done in order to ever reach the finish, a potential solution, or at least, a better model, but I can only encourage it.

In the past, I’ve written a few fairly critical articles on why I believe people shouldn’t verify their Twitter accounts, even urged people to unfollow all the verified Twitter accounts for the sake of awareness, and summarized some of my thinking on how to ‘fix’ dystopic reputation systems, but enough complaining for now.

Now that Twitter recognizes these and other issues, and seems willing to work towards solutions together with the community, I’m happy to provide them (and anyone else interested in the topic) with more constructive feedback.

Based on Twitter’s needs, I will suggest a few ideas to hopefully help them improve their verification process, make timeline results more relevant to their users, and get rid of spam, trolls, sockpuppets, etc., for as far as possible, in the most user-friendly, and cost and risk-efficient manner to both Twitter and its users.

I will refrain from trying to establish a “health metric” in this article, simply because it lies beyond the scope of a one-man show on a near-zero dollar budget. Nonetheless, what I can do is elaborate on what I believe is the root cause of the aforementioned issues, and point out a potential solution that humans have used since we lived in tribes …

All and all, D-Day was a one-pager. So lets hash out the giant red line first, before we go into too much detail. I guess Twitter is mainly looking for a solution, and not so much for the cheapest bidder or a hundred-page document with what-could-be-a-solution. That’s why I hope writing this article might help address some of the discussed issues, either on Twitter, or any other platform that faces similar issues, instead of submitting a proposal through the official channels. If you have any questions, comments, and or suggestions after reading this, feel free to r̵e̵a̵c̵h̵ ̵o̵u̵t̵ ̵t̵o̵ ̵m̵e̵ ̵o̵n̵ ̵T̵w̵i̵t̵t̵e̵r̵ reach out to me on Mastodon.

Exhibit A — General Montgoery’s one-pager for D-Day. My favorite parts: 1) “etc., etc.”, and 2) “**SIMPLICITY**”.

Disclaimer: For those of you who don’t know me, I’m totally biased on this subject since I have a passion for peer-to-peer identity & reputation systems, and I believe (religion, not science) that they will eventually make centralized reputation systems obsolete. Working on making that happen only gets you so far, within a certain amount of time, so I figured sharing some ideas to help improve a widely implemented system might be (more) helpful, short-term.

The Root Cause

In order to understand where we are, and where we’d like to go, we first need to understand how we’ve arrived here. Allow me to summarize trust, all of human history, and how I believe it applies to Twitter and all other centralized reputation systems … in just a few paragraphs.

When humans lived in tribes trust was always direct, from one person to the next. Each person within the tribe knew every other person directly, and if the tribe as a whole stopped trusting you, you would be ostracized from the tribe, to go live with the tigers in the jungle.

Within the tribe you would deliver your messages directly, along the shortest line between huts possible, or, geodesic (the shortest line on a sphere from point A to B). For example, I would walk up to your hut in the straightest line possible, say “BWAAAAHHHH”, and walk back to my hut, again, in the straightest line possible. Message delivered, job done. And in the most cost- and risk-efficient manner possible.

Then came civilization; life in cities. There is an interesting variety of theories on how civilization came about, but for the sake of argument, let’s assume it was agriculture that allowed us to bunch up in groups considerably larger than tribes; cities.

When we lived in tribes, every message or good was delivered peer-to-peer. This is why trust was peer-to-peer as well, because there were no third parties. When we started living in cities, for the first time we had an incentive to introduce middlemen. It’s simply more cost-efficient to have one person run through town to deliver messages or goods than having to deliver every message or good yourself.

By introducing middlemen we introduced the risk of having to trust someone else to get something done for us. In the modern day and age this means we’re not only trusting people with our money, or to provide us with food or tools, but also with our personal data. But before we get to that, allow me to explain something by quoting Hettinga first (raise your hand if you saw this coming).

“Okay. Now let’s look at the future, shall we? Oddly enough, the ‘future’ starts with the grant of telephone monopoly to AT&T in the 1920’s in exchange for universal telephone service. When AT&T figured out that a majority of people would have to be telephone operators for that to happen, it started to automate switching, from electricomechanical, to electronic (the transistor was invented at Bell Labs, remember), to, finally, semiconducting microprocessors. Which, Huber noted, brought us Moore’s Law, and, finally, that mother of all geodesic networks, the internet.”
— Bob Hettinga, a.k.a., RAH (1998)

AT&T was granted a monopoly on the US telephony network on the condition that they would connect everyone who wished to be connected. Some quick napkin math probably showed that this was infeasible with human operators, and in turn created the incentive for AT&T to invest in Bell Labs to further develop the microtransistor. For the first time in human history information was switched not only electronically, but automatically, and kind of as how we did when we lived in tribes; without necessarily having a middleman in-between. Thanks to the microtransistor we don’t have switchboard operators anymore, let alone having a majority of the population behind switchboards. It even created more jobs … but that’s a topic for another article.

Mark Zuckerberg told a US senator the other day that Facebook would be employing 20,000 content moderators at the end of 2018. I think it’s fairly safe to assume each of these people costs at least $50,000 per year, which means they are a ONE-BILLION-DOLLAR-per-year-incentive for Facebook to automate them out of their offices.

See where I’m going with the analogy with AT&T? They realized they had to automate the switching of information in order to make / keep their business model sustainable long-term. The problem nowadays isn’t the switching information though. That’s long-solved. The issue at hand is switching ‘trust’, and nobody seems to understand what we’re doing wrong, even though we've filtered our ‘search results’ from our already trusted sources all throughout human history, and only recently (over the past two decades) have we started trusting random strangers on the Internet. Somehow, many people seem to agree that algorithms and AI are the answer here, but I respectfully disagree. Our human brains still determine more accurately what’s relevant to us at any given time than any computer system, and I am skeptical that this ratio will flip during my lifetime.

Reputation systems calculate trust scores from all sources, including untrusted ones, in order to try and figure out what should be relevant to us, and what not. From my totally biased point of view, this is the root cause of having trolls, sockpuppets, misinformation, and fake news in your personal networks. If that doesn’t immediately make sense, imagine this …

You’re looking at one million ratings from accounts you don’t know/trust on an online shopping platform, that all say a certain seller is good and trustworthy. Now we add a single rating in there from one of your best friends that says this seller is a scammer, and shouldn’t be trusted. Is it now more likely that you will trust the one million ratings from untrusted sources that say it’s a good seller, or the single rating from your best friend who says the seller is a scammer? My guess is you will (subjectively) value that single rating over the one million ratings from people you don’t know. No computer could have known this without the context that’s stored in your brain. Hence …

The subjective value of a trusted source is always greater than the subjective value of an untrusted source.

This is where it gets interesting. We have routed information through people we already trusted throughout all of human history, until not too long ago. When the Internet first became accessible to the public, it was nothing more than a few inter-connected computers. You could literally visit all the websites in the world within a few minutes- depending on your dial-up speed, that is.

At first, Yellow Page-like websites curated and indexed as many websites as they could. Not long thereafter, search engines made them obsolete quickly, and became popular, since search is more efficient than curation. Even though Google PageRank is probably the best-known reputation system around, platforms like Twitter, Facebook, Reddit, Amazon, Uber, AirBnB, and most (if not all) of the other popular platforms use reputation systems to organize information as well. This makes sense when you think about it, since nobody wants a Yellow Pages for the Internet, a message board with everyone in the world on it, or any kind of other haystack to find their needle in. We need efficient, user-friendly, solutions that save time thanks to automation.

The weaknesses of these systems become apparent now that they’ve scaled up and have become global platforms, with hundreds of millions, or even billions of users. Think of those 20,000 moderators, trolls, sockpuppets, misinformation, etc.

If we could go back to Start and try again by filtering from trusted parties only instead, and perhaps their trusted parties (the degrees of separation principle), we could be close to a solution. Still, throwing Twitter as a platform in the bin and starting over isn’t really an option here, but some sort of solution is still required, since Twitter suffers from the same issues as other platforms that are essentially reputation systems.

This sums up my very brief assessment of the situation. Please keep the example of AT&T in mind, since it will return when we discuss moderation.

Twitter’s Assessment

Periscope Livestream

Next to the published RFP, Twitter’s CEO Jack Dorsey held a 45 minute Periscope livestream in which he and his colleagues elaborated on issues like abuse, spam, misinformation, trolls, etc., and why they believe a “health” metric could help. I took the liberty of making some notes while watching the video and will share these and dissect them first, and give my own assessment of the issues at hand, before moving on to what I believe might help.

Opening Up

First off, Jack mentioned Twitter wants to open up to the community, that they recognize the aforementioned problems, and that they’re listening. It could be that the marketing department or shareholders were involved here, but quite frankly, it makes sense to me that the founder wants the best for his company, along with the fact that not that many people in the world are working on solving these issues, let alone running it as code in the wild … on a platform used by a few hundred million people on a daily basis.

“What we are really looking for is the help of the community, and help of academia, and all these people who have been thinking about this for many years. We’re looking for diversity of new approaches that we can take to solve this problem.”
— David Gasca

I may very well be wrong, but I can’t remember the last time Twitter has opened up to the community like this, and as stated in the introduction, they’re probably the first major social media company who is taking ownership of these issues. I can only encourage this type of behavior.

Verified

“Verification, as many of you know, is something we believe is very broken on our platform, and something we need to fix.”
— Jack Dorsey

As stated in the introduction, I’ve written about Twitter’s Verified in the past. These articles were mainly intended to point out the flaws of verifying identities via photocopies of documents like passports and drivers licenses, and why it’s not the best idea to send such copies over the Internet. And to be fair, it’s not just Twitter who does this, so they’re probably not the only ones looking for a suitable alternative.

Interestingly enough, in the Periscope livestream Twitter shared how they’re looking at ways to improve the process. Hopefully this will happen before they open up the Verified program to all users. The following makes me hopeful:

“To do it in a way that is scalable, [so] we’re not in the way and people can verify more facts about themselves, and we don’t have to be the judge or imply any bias on our part.”
— Jack Dorsey

However, the main reason they give to change the verification process is because people tend to perceive verified accounts (so-called ‘blue ticks’) as trustworthy, and labeled ‘good’ by Brent … I mean Twitter. However, when Twitter verifies an account, they simply check if the name on the account matches the name of a person or organization … I guess?

To be honest, I’m confused too.

This doggo clearly has a passport.

To sum up the assessment on Verified: it could use some improvements.

Health Metric

“How can we measure “health” on the platform in such a way that is public and accountable, and larger than simply worrying about does this tweet have the most engagement and using that as our optimizing function?”
— David Gasca

Measuring things like David mentioned, such as engagement, network density, and cohesion, or even the amount of swear words in a tweet, are all relatively easily measurable. For example, you can make a list of swear words and check the words in a tweet against this list, but how would you measure something like whether Alice ‘trusts’ Bob, or believes him to be a nice person, a good barber, or a bad president? Measuring how often Alice retweets Bob’s tweets, or interacts with them in some other way doesn’t seem to cut it.

For example, just because I preemptively blocked Trump and Hillary after they announced to be running for the US presidency a few years ago, doesn’t say anything about how I’d rate them where it comes to statesmanship. In this particular case I simply chose to filter the topic from my bubble before it ever trickled down my timeline, simply because I don’t care. Some say that’s ignorance, but I like to think of it as picking my battles wisely.

As you may understand, all these examples (nice person, good barber, bad president) are subjective logic, so it’s nearly impossible to get an algorithm (consisting of purely objective logic; code) to predict anywhere near-perfectly what you will deem relevant or not, at any given time. You might feel like eating pizza and hotdogs today, but an avocado salad tomorrow. You probably know best, and that’s why you should decide who you want to listen to, and when. Perhaps AI and algorithms can help us improve our experience, but they shouldn’t be the driving factor behind filtering relevancy for us. The user has to do this, for as far as possible, in the most cost and risk-efficient manner.

Tactical vs. Holistic Indicators

David made a distinction between ‘tactical’ and ‘holistic’ indicators. Examples of tactical ones would be: spam reports, abuse reports, the level of toxicity of conversations, etc. In other words, things that are (somewhat) measurable.

Holistic indicators, however, are more fuzzy. These fuzzy indicators might mean a whole lot more to the user, but are harder to measure from Twitter’s standpoint. To get back to the nice person or good barber example, my definition of a nice person or good barber might differ from yours, so how do we (objectively) measure this? As far as I can see, we can’t. At least, not in a way that scales beyond Dunbar’s number, unless we invoke (parts of) the web of trust principle and degrees of separation.

This doesn’t mean that the indicators themselves aren’t useful. What I think this means is that we’ll need a different approach to how we use these indicators. If we apply them from the user’s perspective, instead of from Twitter’s perspective, do results become more relevant to the users?

Positive Indicators

“We want more positive indicators as well. Some of these are around the health of the community. How can we measure whether Twitter, and the changes that we’re making and that the changes that we could make will strengthen the community and make it better in not only a DAU(?) focusing way, in other measures that will probably serve as surveys, internal or external, and also through other indicators that we build through this RFP process as well.”
— David Gasca

In order to understand what health means in regard to Twitter, not only do we have to look at the parts that could function better, but we should also examine the parts that work well already. Are there ways to measure them? Internal and external surveys sound like a good start, since it would request for feedback directly from the users. Most users probably already share their opinions via tweets, but a more organized fashion to gather this feedback would be preferable. Plus, it would make not only the health metric, but the overal user experience, more measurable.

Bias & Censorship

“Are you all progressive liberals and how does that bias you?”
— Jack Dorsey, reading a tweet

Bias is probably the fuzziest indicator around. Again, you might never go to the barber I prefer, and vice versa. This is where I believe many reputation systems eventually fail. Word-of-mouth has worked well for humanity over the millennia, and even if centralized reputation systems on the Internet prevent us from having to manually curate all of the Internet ourselves, they do tend to decide for us what is relevant and what isn’t.

The problem, from the user’s perspective, is that the trust score is made up from untrusted sources. So, when a platform filters out something they consider ‘sensitive’, ‘offensive’, ‘NSFW’, ‘politically incorrect’, etc., in accordance with their trust score and to provide the user with a better experience, it is technically censure. So, no matter the intentions, debates about censorship are likely to arise in these cases.

“I know we are very much accused of censorship and bias, and I think that’s a really important thing for us to constantly assess ourselves by, and our work. And I think it’s important as a part of this ‘health’ initiative to really understand bias, and not just how the individuals making decisions might have implicit bias when they’re making those decisions, but how the engineers and the teams working on our algorithms may be biased as well, and whether they’re fair, quite frankly.”
— Vijaya Gadde

I’m guilty of accusing Twitter of censorship as well. The dichotomy is I understand that this is how it works, and still opted in to their service. I’m not sure if their other users are aware of this too, so sometimes I go on Twitter for a little centralized reputation systems rants. And to be fair, again, it’s not just Twitter who I believe to be guilty of said charges. As far as I can see, it is inherent to centralized reputation systems in general.

From my totally biased point of view, I believe the issue will forever stay around (to some degree) as long as engineers decide what is relevant for their users, instead of having the users themselves decide. Twitter’s point of view, as expressed by Vijaya, made me confident they’re aware of this.

“It’s important to acknowledge everyone has bias, and we are going to be very focused making sure that the products and decisions that we make are going to remove as much of that as possible.”
— Vijaya Gadde

So, not that Twitter isn’t prone to censorship, or that they don’t censor things, but at least they seem to be looking for possible solutions, instead of only hiding behind good intentions.

Policy vs. View

“Behavior of violating policies vs. is this a view we agree or disagree with.”
— Vijaya Gadde

This is an important distinction. There’s a major difference between a person interpreting a message in a certain way because of who they are and what they believe to be true in this world (subjective logic), and saying something that’s illegal, according to laws in one or more countries, and probably some jurisprudence to come along with it.

The latter is fairly easy to measure. Say, threatening someone’s live is pretty much illegal everywhere I’ve been. However, I say things all the time that might offend people, but who am I to predict what will offend who, let alone prevent it? From my point of view, it says most about the receiver, and I don’t see how any engineer or code could ever solve this.

Nobody explains why being offended says everything about the receiver better than Steve.

Platform Improvements

In many situations over the past decade when a social media platform improved something, I was quite disappointed by it because it negatively impacted my experience as a user. Most of the time because I found the new results displayed to me to be less relevant than before. This is where part of my cynicism towards these systems comes from, and why I want more control, as a user, over what is shown to me.

To give you one example: Somewhere over the past year Twitter changed the timeline so it now also shows tweets that have been liked by the people you follow. Most users will probably experience this as something positive, since it presents them with information they might find relevant as well, instead of only seeing retweets from those they follow. Personally, and this might very well just be me, I liked the old way better where only tweets from accounts you followed, or tweets retweeted by them, were displayed. Reasons:

#1: If I now unfollow someone, their tweets will still appear on my timeline, even if they’re not retweeted by someone I personally follow. In order to prevent them from re-appearing, the only solution I can think of is muting each and every one account that pops up with tweets I don’t care for. In other words, the old situation was more cost-efficient to me, since it didn’t cause this problem for me.

#2: I now see tweets from people I specifically chose not to follow, even when they’re not retweeted by someone I follow. Again, the old situation was more cost-efficient and user-friendly, in my opinion.

#3: Since it says “this or that person has liked this tweet”, Twitter shares tweets for me with those who follow me, even if I don’t hit the retweet button. This gives me less control over what I want to share with my network, which is the complete opposite of what I’d like to see as a user. From a privacy-perspective, one could even argue that the old way things were presented was more risk-efficient.

Overall, from my totally biased point of view …

“Just leave the old one alone! Older is better.”
— Satoshi Nakamoto

Hopefully this helps you understand why an improvement doesn’t always feel like an improvement, and how intentions don’t really matter here. Even if Twitter implemented this change to improve the overall user experience, and I’ll give them the benefit of the doubt, what seems relevant for me (according to them) might seem very irrelevant from my point of view. The more restrictions they impose on what I get to see, or the more content they present which they deem relevant for me, the more it may feel like there is bias and/or censorship on their end- from my point of view.

See how I keep coming back to subjectivity, points of view, and bias?

Sometimes we’re wrong though, and change our bias, which the system should also be able to adept to, like in this case:

“280 was a fantastic change.”
— Jack Dorsey, reading a tweet

I admit. I was wrong. 280 is nice. I still don’t see how an algorithm or AI is able to predict this though. #BiasMayChange

Building Trust

“We want to build trust, in every community that we serve around the world.”
— Jack Dorsey

The word “trust” is probably the fuzziest indicator of them all. Again, it’s extremely hard to predict what someone’s (subjective) view will be on any given topic, through objective logic (read: code). And what do we even mean when we say “trust”?

What I think the question should be:

What does “trust” even mean to Twitter, and its users?

First, there’s trust between the user and Twitter, or a lack thereof. Second, there’s trust among users, or a lack thereof.

If users wouldn’t trust Twitter (anymore), communities wouldn’t form on their platform, or would dissipate over time. The same could be said about trust among users. Because if no one listens to no one ..? ¯\_(ツ)_/¯

What trust means to users would need its own article, and I’ll leave it at this for now, because the Lord knows I’m trying to keep this one short. Hopefully I was able to get my point across: Trust, or the lack thereof, among users on Twitter means that they interact with, and listen to each other, or not. I wouldn’t know how to formulate this more concise. #SIMPLICITY

Accountability By Third Parties

“Part of open measurement is to show we’re living up to that ideal, and to hold ourselves accountable to that.”
— Jack Dorsey

There are a few groups holding Twitter accountable already, such as:

Users
Shareholders
Governments
etc., etc. (to quote General Montgomery)

Would invoking yet another party help with measuring the health metric?

Since everything seems to revolve around the users here, I’m curious if they could be the judge. After all, users already judge the platform by either using it or not. The amount of interactions among them might be able to help here. I assume Twitter already has this data readily available, so could we use it as a starting point and iterate from there?

If the health metric was provided for and by the users, you‘d take care of two issues in one fell swoop. Call me Dutch, but that’s a cost-efficiency right there.

Concluding the Assessment

“Plans are worthless, but planning is everything.”
— Dwight D. Eisenhower

To briefly conclude the assessment, and swiftly move on to something more constructive: the big red line of issues has now been drawn out, by both myself and Twitter (or, my interpretation of it). However, the fact that Twitter is a platform with several hundreds of millions of users on a daily basis means it holds many good aspects to it as well. Lets start with what works well already, to solve the issues discussed above from there. Easier said than done, and talk is cheap (and code is expensive), but you have to start somewhere. And it doesn’t really make sense to throw the baby out with the bathwater, so iterations it is. However, Verified will need to … improve.

“You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete.”
— R. Buckminster Fuller

Constructive Feedback

Requirements

Now that we have an idea of what we don’t want to see, and what we would like to see, lets sum up the requirements for potential solutions.

#1: Bias & Point of View
Everyone seems to agree that we’re all guilty of having bias, and that this is something we have to keep in mind at all times. As stated multiple times throughout this article, I believe it’s important to keep the user’s point of view in mind while designing solutions, instead of trying to decide what’s most relevant to the user from Twitter’s perspective.

The reason why social networks have worked for humans in meat space for thousands of years, is because we don’t easily add untrusted (id)entities to the trust networks in our heads. Again, the subjective value of trusted sources (to the user) is always greater than the subjective value of untrusted sources.

Word-of-mouth is a relatively effective mechanism to route “trust” with, and humans have used this technique, along with the concept of “reputation”, since we lived in tribes. All this requires is to have people decide for themselves who they’d want to listen to. This is what I believe to also be the power of Twitter. I know more people who’ve found a job through Twitter than through LinkedIn, to give you an idea. Could this be because an endorsement in the form of a retweet, like, or comment, from a trusted account, is (subjectively) more valuable than a written statement on LinkedIn from someone that you do not know/trust?

#2: Censorship

I will define censorship on Twitter as any case where Twitter filters out information (could be a tweet or a user account) because they believe it to be irrelevant to one or more users, no matter what the underlying intentions are. As stated before, even improvements to the platform can easily be perceived as censorship.

This is why the least amount of moderation by Twitter sounds like a good starting point to work from. The less Twitter needs to moderate what is being presented to the users, the more cost-efficient it will be to them, and the more risk-efficient it will be to the users. Eventually, this approach should be more cost-efficient to the user too, if it takes them less time / interactions to find what they came for, or share content with their intended audience.

This doesn’t mean pursuing this principle will fully solve censorship, since Twitter is still a centralized reputation system, which are inherently susceptible to it. Nonetheless, this doesn’t mean we can’t try to improve the current situation by working with what we have, and iterate from there.

#3: Building Trust

In order to establish (more) trust between Twitter and its users, not only should Twitter keep in mind that their engineers are biased, but perhaps more importantly, that their users are biased. Any intervention in the interactions between any two users can be perceived as a form of censorship. In order to prevent censorship, or the perception thereof, the recommended approach would be to provide the users with tools that help them filter in accordance with their own bias. Could this lead to even smaller bubbles? Perhaps. But since people have routed trust like this for millennia, not investigating this option might become more costly than looking into it further.

If users are given the tools to filter by what they find relevant; their bias, presented results become more relevant (to them) as well. In turn, this should lead to a better user experience, and eventually more trust among users. Increased trust among users, along with a better user experience should result in more interactions between them, and ultimately more trust in Twitter.

“We want to build trust, in every community that we serve around the world.”
- Jack

Besides this, and with GDPR around the corner, communicating (more) clearly what data Twitter collects and how this is being used would also be desirable. I’ve probably never even read the terms of service yet, and even though I think I have an idea of what might be in there, I guess the average user doesn’t have a clue of what they’re sharing and what it’s being used for. With the amount of data leaks and breaches we’re currently experiencing, providing some transparency here wouldn’t hurt. To take this even further, algorithmic transparency could help, where it comes to helping people understand how Twitter works, and in order to receive more (measurable) feedback from users on how they believe it should work. I understand that Twitter doesn’t want to spill any secret sauce, but maybe a middle ground could be found somewhere, without bumping into too many fallacies.

#4: Indicators

A clear distinction should be made between objectively measurable metrics, and the fuzzy (read: subjective) ones. The former can be measured by Twitter relatively easily, in most cases. The latter can only be filled in by the user, since their bias will determine what they believe to be true.

If we can reduce subjective logic back to objective logic, that is preferable. Still, in many cases we can’t, and this is where we can only ask for (in)direct feedback from the user, or measure certain interactions between them.

#5: Reputation

Having (to some degree) trusted sources share content with one another through word-of-mouth, seems more efficient than expecting a third party to decide for their users who is ‘trustworthy’, ‘reliable’, or a ‘good’ actor in any other kind of way. For this reason solutions should primarily focus on the fact that users have already established trust among each other. By increasing quality here, trust between users and Twitter could increase too.

The problem is that we can’t measure exactly how many trolls or bullies there are on the platform, since the definition of “troll” differs from user to user. What might be a troll to me, might be a good friend of yours that you listen to for advice quite often. Who is right in this case? I think we should let everyone decide for themselves.

Personally, I find “reputation” to be a better indicator than “health”, because reputation can almost always be summed up in a comment together with a rating on a certain scale, e.g., “Alice is a good barber. 8/10”. However this is interpreted depends on the receiver, again. When Twitter would add up all the ratings given to Alice, and divide them by the amount of ratings, they’d end up with a general trust score from all sources, and most of these sources are untrusted to most other users. To ensure that whatever score is presented to the user is relevant to them, it needs to be calculated from their point of view. This could be one or two ratings from their first degree connections, several more from their second degree connections, even more on the third degree, etc. In the end, a 6th degree connection is generally still more trustworthy than a source that isn’t connected to the user’s network at all.

A “web of trust” with (to some degree) trusted connections, on multiple degrees

Perhaps even more importantly, from my totally biased point of view, “reputation” tells more about trust between two parties than “health”. Plus, Twitter is essentially a reputation system, so measuring “reputation” might be easier than to measure “health”. At least, it would stick closer to home, or so to speak.

How users regard Twitter’s reputation depends on how user-friendly and cost and risk-efficient their interactions on the platform are. When efficiencies increase, and the word spreads, Twitter’s reputation will increase with it, from the users’ perspectives.

In other words, reputation says something about how good you believe another person or entity to be. In order to measure the health of the platform overall, perhaps we should try to find a way to measure reputation among users, and their perception of how healthy the platform is as a whole.

Verified

By And For Users

Verified is kaput. There seems to be global consensus on that. So, now what?

First off, what is the goal Verified tries to achieve? I’m not sure if there’s a perfect answer to this question, but as far as I can tell its main goals would be to prevent identity fraud and establish trust among users.

As quoted above, Jack stated Twitter doesn’t want to be in the way, and let people verify more facts about themselves, so Twitter doesn’t have to be the judge or imply any bias on their part. The following idea comes to mind …

Imagine being able to add labels to accounts you follow. A label could say “colleague”, “friend”, “troll”, or pretty much anything you want it to say. If I now tag a few accounts as “friend” or “colleague” it would become relatively easy for my other friends or colleagues to find each other, by using me as an introduction point for their own networks. This would require barely any personally identifiable information to be processed, and these labels shouldn’t necessarily have to be public. After following and labeling a few accounts, Twitter could recommend other contacts and tell you they are friends of friends, or colleagues. This doesn’t mean these connections are automatically trustworthy, but Twitter would be out of the way, users could narrow down their search results in accordance with their existing social network, and if someone wants to trust their local barber for that specific “IS_OVER_18” labels, why not?

Is this a bulletproof system to prevent identity fraud? Surely not, but it should grow exponentially stronger with every node added to the network, when networks filter by degrees of separation. The more trusted nodes establish reputation among each other, the harder it will become, and the more work it would take, for an outsider to pull off confidence tricks. And after something like that would happen, one could simply add a “scammer” label to this account to warn anyone who chooses to listen (to the person adding the label) that this account belongs to a scammer or con-artist. Does this automatically mean that they are truly a scammer? No, but again, the stronger the network, the less likely it is that misinformation can easily spread. If someone deliberately marks well-behaving actors as scammers, the chance that someone realizes and removes/down-votes/mutes/unfollows them (along with all their connections), provides them with a reputation incentive to be good actors as well.

The same strategy could be applied to labeling “trolls”. Say I’d label 10 accounts as trolls, and then choose to share my troll list with my personal network, or a specific sub-set I’ve labeled with some tag (friends, friends of friends, colleagues, etc.). By doing so, other people could opt in to my ‘troll list’. They could then either filter out all the -what I deem- trolls, or only display the results of these so-called trolls, the complete opposite. This way, it could be possible to automate and make it easier to filter out -whatever you deem to be- trolls, or to only display those selected accounts for whatever purposes. Whatever you find relevant at that time is displayed, and you don’t have to be bothered with what you’re not looking for, while everyone can co-exist in the same digital space. Everyone could respond to a single thread, and threading could then be filtered by your social network.

Filtering search results from your personal network, and the networks of your already (to some degree) trusted connections, should also provide users with a safer online environment. I haven’t done the math on this, but looking at my own personal experiences, hate speech and targeted harassment seem less likely to come from your personal network, or their connections. I can’t remember the last time I had deny a Russian misinformant entrance to my house when I gave a party, or voluntarily spent an evening with people who I consider to be trolls. The concept is inversely proportional. Would you recommend a random stranger you just met on the street to someone you trust, and who trusts you, over someone you’ve established a long-time relationship with, and who you know fairly well? It’s practically consumer protection by and for consumers, in a way that should be globally scalable today, on any existing network with a decent amount of active users.

By approaching the problem that Verified intends to solve with applying the already existing proponents of the user’s network, personal networks could and should grow stronger. As a result, I expect more users to have interactions with each other, since the platform itself will become more relevant from their perspective.

Allowing users to moderate their timeline better, and filter for relevancy through their existing networks (word-of-mouth), Twitter wouldn’t have to intervene, or the least amount possible (there are always legal aspects involved as long as Twitter processes the data). In turn, this would require less moderators, if users self-moderate their timelines and personal networks.

Talking about moderation …

Moderation

Moderators are expensive, form risks to the user (a lack of ability to communicate), and aren’t in the right spot to decide what’s most relevant for the user. A healthy platform shouldn’t need moderators, because the user would own and control his/her account and associated data. #DecentralizeIdentity

Remember that example of those 20,000 content moderators that will be employed by Facebook by the end of 2018? In 2017, they ‘only’ employed some 7,500 moderators, which means a 267% increase. I don’t have a PhD in math or economics, but that doesn’t sound sustainable or like a viable business model to me, long-term. Something is terribly wrong here, and needs to be fixed if we want to keep using reputation systems on a global scale. They’re practically critical infrastructure at this point, and there’s currently a $1Bn incentive sitting at Facebook alone to fix this issue, and I’m just cherry-picking Facebook because this data was easily retrievable.

Generally speaking, content moderation is a topic that isn’t easily discussed in Silicon Valley. Or at least, that’s the impression I get. My gut feeling is that everyone else’s gut feeling is that we’re really talking about censorship when we say “fake news” or “content moderator”, and as a result nobody dares to define it for what it is, because everyone’s in on it, and quite frankly, what’s the credible alternative? Putting CEOs on show tribuna… Nevermind.

“Already last quarter, we made changes to show fewer viral videos to make sure people’s time is well spent.”
— Mark Zuckerberg

My point being, is that the chance someone in the Valley will do some quick napkin math and figure out this isn’t sustainable increases with the day, and mark my words: AI will not solve this (in the foreseeable future). Perhaps Twitter has arrived at this point already, or any other major platform that heavily leans on reputation systems, but someone has to figure this out for themselves at some point, and realize how much money might be burned each year on something that we’ve already solved when we still lived in tribes.

This is why I started my assessment of the root cause of the issue with the story of AT&T, tribes, and the routing of information. Major platforms are becoming the new AT&T and will need to figure out a solution in order to keep their business models viable in the long run. Communities on the Internet work best (read: most cost and risk-efficient) when they connect as peer-to-peer as possible. And instead of routing information we need a solution to route trust. As I’ve explained before, this practically comes down to routing word-of-mouth among trusted parties and reputation.

Talking about pioneering, solving the discussed issues could provide a direct solution to the burden of hiring content moderators. And lets be honest, content moderators don’t get to see the nicest side of humanity, so maybe we should just do it for them?

Conclusion

The topics discussed in this article are complex, and I only touched on them very briefly. It does make me happy it’s attracting serious attention at this point, but unfortunately things always seem to have to go wrong first.

Hopefully the part of history where AT&T had a problem that incentivized them to invest in a possible solution that turned out to change the world forever (micro-transistor → computers → Internet → you reading this) repeats though. This time, in the form of a major Internet platform that realizes we’re not routing trust efficiently among users, and that the solution might be closer at hand than we may have realized. From where I’m sitting, not investing in a possible solution is already costing these companies billions of dollars every year.

Of course, I could be completely wrong because I’m totally biased, and if anyone could prove me wrong (I did make quite a few assumptions in this article …) it’s a major social media platform.

I’m not sure how to define “health” as a metric, but have shared several other thoughts on what I believe is related to such a metric, and what I believe might help to solve the associated issues. It seems to mainly depend on the users’ view on Twitter and its users (reputation), and how relevant the content on the platform seems from their perspective.

Last but not least, this is how I would define “health”, in regard to Twitter:

Making presented information most relevant to the user, and solving abuse, spam, ‘fake’ news, trolls, bullies, sockpuppets, etc., as far as possible.

“That — that’s it.”
— Forrest Gump

Hopefully this article inspired anyone with any kind of interest in the topic in any way. If you feel like proving me wrong, or have any other questions or comments, feel free to reach out to me on Twitter.

On Twitter’s “Health” Metric

And Thoughts on ‘Fixing’ Twitter

Introduction

The Root Cause

Twitter’s Assessment

Periscope Livestream

Opening Up

Verified

Health Metric

Tactical vs. Holistic Indicators

Positive Indicators

Bias & Censorship

Policy vs. View

Platform Improvements

Building Trust

Accountability By Third Parties

Concluding the Assessment

Constructive Feedback

Requirements

Verified

Moderation

Conclusion

Written by Tim Pastoor