Work in Progress: Diagramming Platform Moderation Dilemmas

5 min readOct 1, 2018

Recently, I’ve been looking for a way to diagram some aspects of the dilemmas that platforms face in making decisions around moderation.

This is partially as an intentionally simplified aid to thinking through the various, complicated, and conflicting pressures in the space, and as a means of clarifying some of the ongoing debates on the issue.

One model that’s been helpful for me is the above — a bell-curve which charts user engagement against the degree of content moderation. The rough assumption here is that extreme, heavy-handed levels of moderation constrain user expression and drive down the willingness of people to use an online social platform. Conversely, extremely absent moderation allows garbage to fill an online social space, making it less easy and less pleasant to use. This similarly drives down the level of engagement. There is, then, some theoretical optimal of moderation that exists between these two extremes which maximizes user “engagement”. We’ll call this an Engagement-Moderation Curve (or E-M Curve).

I think this is helpful because you can start to diagram out certain types of arguments that you hear in the space and delineate the differences. For example, we have a large number of conflicts over where a platform is along this curve:

We might argue that:

“We need more content moderation because it’ll create a safer space which ultimately drives up user engagement” (the “business case” argument for more moderation). That’s the dot before the crest of the bell curve.
“We need more content moderation because — while it’ll drive down engagement — we should weigh user safety more than pure profit” (the “social good” argument for more moderation). That’s the dot after the crest of the bell curve.

Desirables and Undesirables

You can quickly start nuancing this diagram. Say that different groups of users engage differently at different levels of moderation. So, a classic situation is the one diagrammed below:

In this situation, you have two cohorts. One cohort is a group of “undesirable” users. These users can be undesirable for a variety of reasons: they’re white supremacists, for example. Another cohort is a group of “desirable” users — for the purposes of example and simplicity, let’s say these are not white supremacists.

In situations where lower levels of moderation actually drive user engagement among undesirable users (Po), the challenge is to get platforms to up-shift their moderation to a more aggressive policy (P1) to replace these undesirable users with desirable users.

What’s challenging is that, if the E-M Curves are divergent enough, there’s a chasm where a slow ratcheting up of the moderation policy will for a time depress overall engagement until you get to the other side.

It’s possible for moderation policies to just leap upwards, but our tools for doing so are imprecise. A platform faces more risk of landing in the chasm the wider the gap between the two curves. This makes platforms hesitant, even though the prospect of ending up with similar levels of engagement (with less undesirable users) is somewhere they’d want to be.

Now, the nice thing about this diagramming is that you can distinguish this case from another scenario where platforms are hesitant to moderate more aggressively:

In this situation — the E-M Curves of the desirable and undesirable users are very close together. There’s almost no chasm of engagement as you move along the x-axis here.

The challenge here isn’t just a matter of ensuring that your moderation policies are sufficiently aggressive. Indeed, increasing moderation is likely to end up driving down engagement from both populations — undesirable and desirable alike.

Of course, in these situations, the answer is simple: “don’t treat white supremacists the same!”

We need a policy that effectively treats two populations differently. P(d) would have the optimal level of moderation for desirable users, and we’d push a policy of P(u) that’d try to shrink undesirable user engagement as much as possible. Essentially, the outcome here is that similar E-M Curves across populations force companies to create more discriminating policies.

Public Opinion and More Complexity

Now, these charts actually let us nuance the model once further. In practice, companies are constrained by public opinion in how far they can create these discriminating policies. There are lots of reasons for this. For one, platforms — for better or for worse — are concerned about accusations of political bias. They’re also limited by company culture and norms about how far to treat different groups differently.

We might say that these platforms have an acceptable window of divergence in treating different populations on the platform differently. Charting this visually:

In other words, even if the platforms wanted to maximize their limitation of undesirable user engagement, they can only set P(u) so far away from P(d). That’s one way of articulating the public opinion battle around deplatforming in platform moderation: is it possible to expand this acceptable window such that platforms will accept and implement more discriminating policies?

This chart is deceiving, though. It suggests that marginal increases in the size of the window (i.e. convincing the public that just sometimes it is OK to totally ban Alex Jones for life) will have a big effect on the user engagement of undesirable populations. But, the precise shape of the E-M Curves influence the outcome:

So far, we’ve been playing around with mostly identical curves. But, what if the curve for undesirable users is far fatter than we expect? This is a situation where even a few successes in expanding the acceptable window will not produce the conditions under which platforms will take sufficient action to deal with undesirable users. You’d need to expand the window considerably further before the level of moderation will be sufficient.

The outcome in these cases is interesting, in my mind. There are situations where advocating for a more aggressive, uniform moderation policy which depresses the engagement of all users is the easier lift than trying to create pressures that force the hand of the platform in taking action on specific harmful populations.

Anyways, writing this rough draft up to get feedback as I keep noodling on this some more. Thoughts / ideas are welcome!

Work in Progress: Diagramming Platform Moderation Dilemmas

Desirables and Undesirables

Public Opinion and More Complexity

Written by Tim Hwang