A Billion Grains of Rice

Reflections on Social Network Content Blocking & Censorship

A few days ago there was a tremendous kerfuffle regarding the Kim Phuc “Napalm Girl” photo, and Facebook “censorship”. Over on Twitter, Dan Hon posted a not-terrible tweetstorm, raging against “Silicon Valley” regarding the matter. Some extracts include:

https://twitter.com/hondanhon/status/774307803104104448

“it’s difficult to create a distinction between allowing a photograph of a nude child in one instance and not others” — NO SHIT SHERLOCK

…and…

Facebook: some things are difficult and worth doing (Internet access to 1bn people). But others are TOO HARD we’ll just give up.

…and, edited for clarity…

This is why you get a bad rap, Silicon-Valley-In-General. You (legitimately) excitedly talk about solving (some) hard problems while *literally giving up* and saying it’s too hard for other ones that are equally important. You can’t have it both ways.

…and…

You know what’s also difficult? Scaling a comms platform to over a billion daily active users! BUT YOU MANAGED TO DO THAT, DIDN’T YOU.

…and, again edited for clarity…

Did you know, Tech Industry, that the rest of society figured out how to address these gray areas? That law is designed to be malleable? That we have mechanisms for review, independent things like “courts” that help us figure out what’s porn and what’s not? And that sometimes dealing with those Difficult Problems that result from people using the stuff we made by solving other Difficult Problems requires thinking about those problems in a different way, and scaling and building structures in different ways? But noo we’re engineers and software engineers at that and we like neat things that fit in boxes and if we can’t fit them in boxes, TOO HARD

I’m a former Facebook engineer and while working there I learned that people, myself included, can have real problems addressing “scale”. One of the most useful self-exercises I learned was to “visualise 1 billion”. Why? Facebook claims more than 1.7 billion monthly active users, which is a big number with lots of zeros after it — 1,700,000,000 — but when we put the “billion” label on it the immensity and cost can feel “neutralised”.

So let’s start by rounding it down to 1 Billion; for purposes of visualisation, what does a billion grains of rice look like, roughly? You can read the maths and citations in this post’s first part, but here’s the punchline:

  • Assume: 64 grains of rice = 1 gram
  • 1 billion grains weight = 15,625kg, 34447lb, 15.63 tonnes, 17.22 US tons
  • Assume: density: 1.22l/kg
  • 1 billion grains volume = 19 cubic meters
  • Assume: conical mound, 45-degree sloping sides
  • 1 billion grains mound = 2.63m tall (8.62ft) diameter 5.26m (17.26ft)

There’s a huge mound of rice, much taller than you, and it weighs a lot.

A Billion People, A Billion Postings…

So if we have a 1.7 billion users, and some of them log in once per month and post nothing, whilst others are permanently connected and post several things per day, how many items would they post on a daily basis?

I don’t know the specifics, but clearly the number will also be in the order of 1 billion items posted per day. How can I justify this? Because frankly a factor of 10x or 30x, up or down, will not make much difference to the following figures and argument, so let’s go with our figure of 1 billion again.

Say that we have a billion grains of rice in a 15.6 tonne, 2.6m-high mound, and amongst them are some rice-like mouse droppings. Not many mouse-droppings but we obviously want to expunge all the poop. So we hire a team of rice-inspectors to check individual grains for poopiness, and they have to complete it in 24 hours because another billion grains arrive tomorrow.

How many rice-inspectors will we need, and how much will they cost?

Again, see the math in the previous article, here’s the punchline:

  • Assume perfect focus, no bathroom or food breaks, 8 hour shifts
  • Assume inspection of 1 grain = 6 seconds; 10 inspections per minute
  • One inspector = (8 * 60 * 10) = 4800 inspections per day
  • 1 billion postings / 4800 = 208,333 inspector-shifts per day 
    (i.e. 208k inspectors/employees)
  • Assume salary $10/hour
  • Daily salary cost = 208333 * 8h * $10 = $16,666,640;
    Annualised = $6.1 billion, ignoring buildings, management, insurance, other costs, etc.

So we would be paying $6.1 billion per year for 208,000 overworked employees to comb over our daily mound of rice looking for mouse droppings. This sounds absurd, but — returning to the substance of Dan’s tweetstorm — this is the “order of magnitude cost” we’d be discussing to implement human review of every piece of content posted in a day.

If somehow we were to drop the human requirement by 30x — and remember that we’re modelling this with perfect execution, 6-seconds per item, without HR, management, infrastructure & other costs, let alone bathroom breaks or “living wage” — if somehow we got a 30x efficiency boost, then we’d still be dealing with 7000 reviewers at $560k/day — $204mn annualised basic salary cost, ignoring the other costs.

Why Do This Math?

So I posted on my friend Owen’s timeline, regarding Dan’s tweetstorm:

Do you remember the scene in “War Games” about “Taking the Humans out of the Loop” for the WOPR computer? It’s hard for me to shake the impression that folk calling for Facebook to address “hard problems” with more computation are a) the first to complain when Facebook Trending boosts a fake-yet-trending story, and b) the moral mirror image of the politicians and spooks who demand that the Valley “Nerd Harder” in pursuit of cryptographic “Golden Key” backdoors which will do what _they_ want, too. A pox on both houses.

This yielded responses:

Owen: Who said anything about addressing the hard problems with “more computation”? Who said anything about “taking the humans out of the loop”?
Dan: Thanks Owen, that’s definitely my position — I’m not calling for or saying that the difficult/hard problems can or must be only solved by more computation.

…but there’s the rub; perhaps Dan (and Owen?) did not realise that they were calling for more automation… but they were calling for more automation. No sane platform company is going to swallow a $200mn…$6bn salary cost to review every piece of content which is posted on it. There will be (as Dan says) different strategies— e.g. the following entirely hypothetical examples:

  • Strategy #1: “permit everything and manually review anything which gets reported” — scalable and largely self-policing, but errors will happen (“is that a forbidden “porn-boob” or a permitted “breastfeeding-boob”? and what do we do about lactation/breastfeeding-porn?”) — and when errors happen, the braying media will never let us hear the end of it.
  • Strategy #2: “throw machine-learning at image recognition, selectively-block or send for review anything which gets flagged as undesirable” — a scalable and low-profile approach, with manageable costs because we can dial the matching up & down; however we’re then building an enormous mechanised brain which governments will demand to leverage to enable censorship of images they don’t like, e.g. of tanks running over street protestors. 
    Also, when errors happen, the braying media will never let us hear the end of it.
  • Strategy #3: “let people flag content which is somehow ‘really bad’ and hide it automatically” — this would lead to organised pressure groups blocking the free speech of people that they don’t like. It would be like Wikipedia edit wars, but group-on-group and with far lower barrier to entry. And when errors happen, the braying media will never let us hear the end of it.
  • Other strategies, including hybrids or “sampling”, exist…

None of these are perfect. None of them will catch all the poop. You’ll notice I’ve taken a cheap shot joke (“the braying media”) with all of these examples, and that’s because it’s an invariant; it doesn’t matter WHAT form of content review and blocking is implemented, eventually someone will make a mistake and — because it touches a billion people — Boom! — there will be headlines aplenty about at least one of:

  1. “overbearing censorship” if they don’t like that it was taken down
  2. “inappropriate censorship” if they don’t like (apparently) why it was taken down
  3. “insensitive censorship” if they don’t like the context surrounding why it was taken down
  4. “inadequate censorship” if they don’t like that it stayed up

Frankly arguments about Facebook “censorship” are as boring as mouse droppings, and I speak as a dyed-in-the-wool free speech activist. My perspective: of course the Kim Phuc picture is a massive historical icon and of course it deserves to be “whitelisted” / not treated in the same way that any other image of an 9 year old girl running full-frontal-naked down a street would be.

However, as a global platform, I believe that Facebook can take whatever approach it prefers towards dealing with the photograph, or any other content. Given that — even in spite of an righteous Norwegian premier — there are some countries where possession of that image will certainly get you into trouble, it is wise and appropriate for Facebook to be circumspect about addressing the matter; not least to help protect the people who use the platform in those countries.

Dan wrote: “…law is designed to be malleable?” Perhaps in America, and somewhat in the EU, but Facebook is a global platform and in some places in the world the law is absolute fascist shit.

Dan wrote: “…we have mechanisms for review, independent things like ‘courts’” — again, these are appeals to parochial review.

Dan wrote: “You know what’s also difficult? Scaling a comms platform to over a billion daily active users! BUT YOU MANAGED TO DO THAT, DIDN’T YOU.”

Yes, Facebook did that, and I helped them.

But I also understand that it’s easier to bag 15 tonnes of rice per day and distribute it to those who want it, than it is to inspect every grain for poop.

If we worry about censorship then we should be worried about Strategy #2, above; if we blindly call for more, better & smarter blocking, if we call unknowingly for social media companies to develop internal mechanised Total Information Awareness such they become practically perfect at censorship, distinguishing content which should be blocked in one country versus content which should not be blocked in another, almost never making the temporary mistakes about which we complain, like the above — then the consequences for global liberty will be terrifying.

If we worry about censorship, we should be careful what we ask for.

Originally posted on my timeline. Image source.