Internet Fragmentation: Beyond “free” and “closed”

Nick Merrill
CLTC Bulletin
Published in
10 min readNov 26, 2019

Nick Merrill directs the Daylight Lab at the UC Berkeley Center for Long-Term Cybersecurity and writes a weekly newsletter about politics and the Internet.

According to TechCrunch, the “splinternet” is already here.

According to Fortune, it’s growing.

Governments, firms, and citizens are again debating to what extent the Internet is, or should be, a global infrastructure for communication and commerce. In these debates, the Internet is supposedly becoming increasingly fragmented, subject to “splintering” across national borders.

This debate’s extremes — a global Internet on one hand, and a nationalized “splinternet” on the other — are cartoons of what has always been a mixed reality. The digital world never has floated freely, untethered from political and geographical boundaries, nor has it ever been fully under the political control of particular countries.

Nevertheless, the question of how the Internet differs materially across national boundaries is a pressing one. Understanding the Internet’s character in different countries is key to policymakers and advocacy groups looking to understand the effects of laws and norms, and to decision makers in industry looking to build and deploy products on a global scale.

However, these debates are currently carried out on the basis of anecdotes and impressionistic assessments, not standardized metrics. The metrics that do exist, such as Freedom House’s Freedom on the Net index or the New America Foundation’s Digital Deciders, place a particular political value of freedom at the center of their indices, implicitly assuming that a less fragmented Internet is a more “free” Internet. Current moves by Russia and China question the assumption that a global Internet architecture will by necessity impose Western conceptions of freedom. A global Internet could just as easily become globally censored.

Measuring the Internet — beyond “free” and “closed”

We aim to bring value-free measurement to this debate about Internet fragmentation. Rather than building normative judgments into our metrics, we aim to measure the underlying architecture of the Internet. We seek to enable normative discussions about what values to maximize, and descriptive discussions about how best to navigate a (potentially) fragmented world.

To achieve this, we built a dataset to measure Internet fragmentation through proxy measures on four different layers of the Internet “stack.” For this initial step, we collected proxy measures for the year 2019 (though we intend to collect data year after year, as well as historical data). With our dataset, we were able to produce “profiles” of different countries’ Internets.

You can find all of our code, data and analysis on Github.

We started our project thinking that we would measure the velocity and magnitude of Internet fragmentation over time. However, during our research, we discovered that it’s more interesting to think about the shape of fragmentation.

While conventional wisdom states that the Internet is bipolar between “free” and “closed” countries, we find that the Internet is multipolar; some “free” countries and “closed” countries can all have meaningfully different profiles from one another. In addition, some “free” countries have surprisingly similar profiles to some “closed” countries.

Measuring Internet fragmentation

“Layer models” describe the Internet’s composition using hierarchical concepts, each defined by their function. Each layer sits “on top” of the layer beneath it, with the more abstracted layers sitting at the top of the stack.

The TCP/IP layer model of the Internet, with the proxy we chose for each layer.

We’ll briefly cover each of these layers, and the proxy we use to measure it. We collected one proxy measuring fragmentation at each layer of the stack, with each proxy measure spanning all of 2019.

Layer 5: Data laws

The “fifth layer” refers to human activity (e.g., laws and politics).

As a proxy for fragmentation on the human layer, we measured the degree to which countries have laws regulating the flow, storage or production of data. Our hypothesis is that such laws increase the friction in the flow of data between countries, making the Internet more nation-specific.

We found relevant laws in 66 countries. We coded these laws across 42 different categories, ranging from data locality laws, to restrictions on cross-border data flow, to regulations around the use of encryption. For each country, we tallied whether or not the country had a law in any of the particular categories, producing a score for each country ranging from 0 to 42.

Layer 4: Website ranking locality

Application layer. Standardizes communications across transport layer to create specific-purpose functionality. Web browsing (HTTP/HTTPS) is a familiar example of application-layer software.

To measure application layer fragmentation, we looked at the most popular websites in every given country, and determined how similar or different these rankings were from the global total. While factors such as language and culture certainly produce meaningful differences here, we assume these factors will remain stable over time; as such, divergences in this metric over time should reveal fragmentation stemming from, for example, censorship.

To compute this metric, we use the Alexa rankings to determine the 50 most popular websites in every country (by traffic). We use Levenshtein distance to compute the edit distance between each country’s list and the global list. This leaves us with a metric expressing how much each country’s web browsing habits differs from the global rankings.

Layer 3: Network interference events

Transport layer. Abstract network and link layers to provide appearance of persistent connections.

To measure transport layer (layer 3) fragmentation, we use data collected by the Open Observatory of Network Interference (OONI). OONI requires volunteers to install a plugin, which periodically performs tests to measure circumvention on the transport layer. These include mostly state-launched attacks (such as DNS manipulation or traffic filtering), but may also include private-sector manipulation, such as throttling streaming traffic.

We calculate this metric as a rate of anomalous network events (the number of all anomalous events divided by the number of all observations).

Layer 2: IPv6 adoption

Network layer. Transfers data between networks. The IP protocols (either v4 or v6) manage this transfer.

You may be familiar with the notion of an “IP address,” a unique locator that identifies every machine on the Internet. For decades, we used IPv4 addresses (e.g., 192.168.1.1). However, the proliferation of Internet-connected devices has put pressure on our address space, and the Internet is now running low on IPv4 addresses. A new standard, IPv6, is slowly being rolled out, but this roll-out has been uneven across national borders.

As a proxy for interlink layer (layer 2) fragmentation, we use Google’s per-country IPv6 adoption statistics, which Google collects from all Google services and analytics users. We consider greater IPv6 adoption as lower fragmentation, as countries remaining on IPv4 will experience exhaustion and possible outages. We decided not to control for wealth, as there is no reliable correlation between IPv6 adoption and either GDP or GDP per capita.

Layer 1: No proxy

Link layer. Transfers data within a local network.

You might have noticed from the table above that we did not pick a proxy for the link layer (layer 1). In practice, the entire world uses the Ethernet protocol for their link layer, at least as far as we are aware. In the absence of compelling data about fragmentation on this layer, we excluded it from our model.

Visualizing our data

To understand our data, we produce a “radar” (or “spiderweb”) plot. The radar plot describes a given country’s spread across the four metrics we currently measure. By visually comparing these radar plots, we can establish rough “profiles” for countries.

The Internet profiles of Denmark, Sweden and Norway compared to one another.

Here’s an example. It compares three Scandinavian countries to one another. This chart shows us that Norway has a higher rate of network interference than its neighbors, while Sweden has a higher level of content-layer locality. All countries have roughly the same level of legal-layer locality.

What we found

In examining our data, we began with one piece of “conventional wisdom.” We’re not listing it here to suggest that you or anyone else believe it, but instead place it as an experimental baseline — a protractor we can compare to the shape of our observations.

The conventional wisdom is that the Internet is largely bi-polar, split between “free” and “closed” Internets, and countries’ profiles will be similar within these basic groups. Often, the “free” Internet is perceived as US-led, where the “closed” Internet is seen as being led by China.

From this conventional wisdom, we would expect to see Western countries to be largely similar to one another. We would expect to see China to have a different profile — but to be similar to other countries with “closed” Internets, such as Saudi Arabia or Bahrain.

Starting from this conventional wisdom, here’s what we actually observed.

1. The Internet is not bi-polar

Bahrain and China both have what some describe as “closed” Internets, but their profiles are, in reality, quite different from one another. In a similar vein, the United States and Germany both have “free” Intenets, which are, in fact, quite different from one another. In fact, all four countries have unique Internet profiles.

Our conventional wisdom would predict a bi-polar Internet, with China on one side and the West on the other. In this model, it would be tempting to place, for example, Bahrain on the China “side.” Germany might sit on the US “side.” Our metrics reveal a much more complex picture. In fact, all four countries mentioned here have different profiles from one another.

The Internet is multi-polar, with different Internet governance decisions producing diverse types of fragmentation.

2. “Similar” countries diverge in surprising ways

Per popular imagination and some recent reports, China’s model of the Internet has set a precedent, one which other Belt & Road countries follow.

China compared to other so-called “Belt & Road” countries (Laos, Indonesia, Mongolia, Pakistan, Djbouti, Argentina, Sudan). China’s Internet is substantially different from the rest of this block.

Our data challenges that assumption. We selected countries in which China has made significant investments in local infrastructure: Laos, Indonesia, Mongolia, Pakistan, Djbouti, Argentina, and Sudan. In fact, China stands out from other Belt & Road countries. It has more data locality laws, a high degree of website locality (perhaps as a result of censorship), and significantly higher network interference.

However, these Belt & Road countries have a similarly-shaped profile to China’s; they are simply smaller on three axes of fragmentation. Tracking changes to Belt & Road countries over time could help us verify — or refute — the claim that China’s model sets a precedent for countries receiving significant Chinese infrastructure investment.

3. “Different” countries may share unexpected similarities.

Norway’s Internet compared to those of Saudi Arabia, Kuwait, Bahrain and the UAE.

Countries you would expect to be different can be surprisingly similar. If you asked me what Norway has in common with Saudi Arabia, Kuwait, UAE, and Bahrain, I’d have said “aside from oil, not much.” In reality, these countries have similar amounts of network interference, a similar degree of content-layer locality, and similar IPv6 penetration. The main difference that our profiles capture is simply that these Gulf states tend not to have laws restricting the flow of data.

What now?

Our profiles allow us to view the shape of ‘internet fragmentation’ in a more nuanced way. Through this view, Internet differences across national borders become more specific and actionable.

Our data could provide industry stakeholders with a roadmap for moving from one market to another. For example, if a technology company is thinking of moving from the United States to Southeast Asia, it could use our data to select those countries with which the United States has a similar Internet.

Finally, as we collect this data year after year, we will be able to measure the velocity and direction of fragmentation over time. Is the Internet becoming more fragmented overall? Our metrics each show one dimension of fragmentation. If we observe countries, in aggregate, “ballooning out” to fill the radar plot, we could begin to describe the degree to which the Internet is becoming more fragmented. (And, if it is, how quickly? How decisively?).

Future work: Clusters of interoperability

Measuring the shape of internet fragmentation is the first step toward understanding clusters of interoperability. In future work, we hope to create point-to-point comparisons of countries, producing clusters of Internets that are interoperable with one another.

For example, on our content layer, we currently look at how similar popular websites are to a global average.

  • But how similar is Argentina’s list of most popular websites compared to Chile’s?
  • How similar are the types of network interference in Argentina vs Chile?
  • Both countries may have data laws, but do they have the same or similar data laws?

With this tool, policymakers could create more targeted interventions to, for example, “join” a like-minded cluster or reduce transaction costs with an opposing cluster. Those in industry could use these clusters as a key strategic planning tool, allowing them to move products across Internets that, despite superficial differences, are actually interoperable.

Future work: The physical layer

The Internet is fundamentally a physical infrastructure, built out of heterogeneous edge devices, switchboards, cables and radio waves. In the future, we hope to add the lower, physical layer of the Internet to our model, analyzing (for example) cable-cutting incidents, heterogeneous devices or tools among switchboard or other infrastructure operators, and perhaps even fragmented devices on the edges of the Internet (such as penetration of personal computers, cellphones, tablets, etc). Our hypothesis is that fragmentation on the physical layer fills a missing component of our Internet fragmentation story.

Questions? Comments?

Contact Nick Merrill at ffff at berkeley dot edu.

Nick Merrill directs the Daylight Lab at the UC Berkeley Center for Long-Term Cybersecurity and writes a weekly newsletter about politics and the Internet.

--

--

Nick Merrill
CLTC Bulletin

Director @ Daylight Lab, UC Berkeley Center for Long-Term Cybersecurity — daylight.berkeley.edu