The Pointless Blockade

Bad Policy Doesn’t Scale : Indonesia Internet Censorship, in numbers

To be fair, it took a future Sith Dark Lord to break this one.


Last week, when President Jokowi of the Republic of Indonesia visited Silicon Valley to meet its tech giant elites, a small panel inside Ministry of Telecommunication & Information decided to include Yahoo’s Tumblr inside the latest update of internet censorship blacklist.

The reason, pornographic content.

The impact, an immediate uproar from thousands of Indonesian internet users who frequent the 500-million blog network, one of the Silicon Valley success story. Right when one hand shakes, the other stabs.

The ministry retracts its decision a few hours later, but the damage was already done years before. As of today, every ISP complies with the mandatory blacklist that includes large content aggregator and publishing platform such as Reddit and Vimeo. Relying on a domain-based blacklist that is woefully rigid for the rise of ever-dynamic user generated content, the ministry insist that NSFW filter is not enough and constantly pressured Facebook, Twitter, and other internet giants to enforce self-censorship in order for them to remain operating in Indonesia, a country of more than 250 million people eager to be connected.

Try explain that to policymakers.

If you have to ask, my opinion is such censorship is neither justified nor realistic. Leaving the “moral judgement” requirement aside, even the best machine learning algorithm still has its shortcoming when it comes to making the call. Unbounded by dotted line on the map, even China’s Great Firewall can’t block every nudes and silence every protest. For now, content moderation is a multi-million dollar business. A year ago, it is estimated that tech giants outsourced this task to more than 100,000 content blockers. The ministry, with its multi stakeholder panel comprised of government officials and civil society representatives, numbered less than 50. The government has no choice but to carry on. Cast the net, wide.

Thankfully, since I’m based in US, I’m not breaking any law while following the leads on these irresistible forbidden temptation. While the debate rages about the policy and implementation on how they cast the net, I decided to take a look at what it managed to catch so far.

A sample of the domains listed in the blacklist.

746,907 blocked and counting

TLD treemaps. Because I don’t like pie charts.

The list, Trust+Positif, is a simple flat text file, a domain a line. Using a couple lines of R script, tldextract library, and a simple function to clean up the resulting dataframe, I managed to separate each entry into its subdomain, domain, and TLD for further exploration. Some key takeaways:

1. Small fishes
For internet veterans, a cursory look at the list is enough to identify that a huge number of the entries are simply click farms. Sure, all the tube sites are blocked which already slashes 99% of the goodies (who the hell still uses “3gp” as a keyword?) but why bother to put these on the list? Some of the domains are insanely long (65 characters subdomain + 65 more for the domain) to the point that it’s somewhat unreasonable imagining a 13-y.o boy in rural Indonesia managed to discover the listed URL at all.

Let’s put it this way: there are 17,915,154,163,055,217,451,376,497,752,818,481,395,241,950,788,599,926,259,708,266,594,121,825,105,595,841,586,023,890,944 possible combinations of letters and numbers that you can use for your website address. Before unicode.

What 66.154.20.x used to be.

2. Nameless fishes
Almost 100,000 of the entries aren’t even fully qualified domain names; it’s IP addresses. Forget about the 13 y.o, I wouldn’t even be able to find them unless I purposefully searched for it or click it from somewhere. I even run a script on those addresses and found out that they were long gone. What if someday I rented out a VPS and got assigned one of those IP, without ever knowing that it’s blocked in a country of a quarter billion?

Oh and hey, don’t forget about that IPv6 thing. That’s 2¹²⁸ more IP address that you might have to add to that list, give or take a few trillion trillion trillion or so.

3. Official-looking Fishes
A Google search of “nude beach at Gibraltar” resulted in a few spots on the Spain side, but Her Majesty’s Government of Gibraltar ( somehow is blocked under “porn” category, the only one receiving this honor.

Our future territorial dispute, some 7,700 miles away.

4. Educated (and Queer) Fishes
Berkeley University, University of Washington, University of Essex, University of Rochester, University of Michigan, Stanford University, Indiana University, Ohio State University, Purdue University, Colorado State University, UC Davis, University of North Carolina, and Cornell University are among those whose one or two more pages blocked. Most of their content are LGBTQ student groups or gender studies.

and EXTREME high performance distributed computing.

and online libraries that might give you inappropriate illustrations from med school textbooks.

and a student union at University of Cambridge named after a rather famous historical figure.

All of these fascinating specimen, of course, are conveniently marked as porn.

An Appeal of Sensibility

Sure, yes, of course you can e-mail the ministry and ask nicely to remove your sites from the ever growing list. You can do so at….

…oh wait, you can’t. As per its website, you are highly encouraged to report sites in violation through a simple web form but I’m afraid that you really have to be a big player (preferably local) in order to amass support from “netizens” to remove yourself. It took only one to put you in yet it took thousands to get out.

More like, mostly 404.

Whatever you think you’re doing, dear Indonesia, it’s just not working. I know you couldn’t backtrack; your political enemies will feast upon the smell of your blood.

What I’m earnestly asking here is sensibility. Open up your process. Let people know how you decide what’s in and what’s not. Let (sane) people choose what’s in and what’s out, instead of some questionable authority in a weekly meeting.

or better yet, just let go. Thank you for blocking thousands of fake scam that robbed millions from our clueless parents and grandparents. Thank you for blocking radical Islamic pages that divided our societies. Thank you for the inclusive process that brings truly caring Internet freedom fighters to your table.

But on the sheer game of numbers,
you might know already that you should’ve just let go.

Now before you ignore of what I just wrote, would you please block all those local e-commerce and forums promoting sexual services, illegal software, Ponzi scheme, and thoughtcrime? No?

I rest my case.