Using the internet to predict the future of the South China Sea dispute

Visualization of the composite Predata online volatility signal for the South China Sea dispute.

Richard Laurent

For years the world paid little attention to the periodic skirmishes and sovereignty disputes in the South China Sea, rightly reckoning the claimant countries would stay focused more on internal development than pursuing cross-border conflict over a collection of barely inhabited reefs and atolls, whatever their long-term economic and strategic worth. But rising prosperity across East and South-east Asia over the past decade, and the economic re-emergence of China in particular, have brought the dispute over who “owns” the South China Sea vividly to life, and this year tensions in the region look set to reach a new pitch of intensity. New leaders are in place in Taiwan and Vietnam, and their approaches to relations with China are yet to become clear. A ruling in the case brought by the Philippines against China at the Permanent Court of Arbitration in the Hague is likely some time over the next few months, with few anticipating an amenable Chinese reaction should the court rule, as expected, in the Philippines’ favor. And electoral volatility in the US, where the leading Republican contender for the presidency, a slasher beholden to nothing but his own hair, advocates a drastic reduction in the US’s stabilizing presence in East Asia, is feeding a new climate of regional strategic uncertainty.

In recent weeks reports have emerged of assertive moves by China to strengthen its foothold in both the Paracel and the Spratly island chains: On February 17 Taiwan and the US claimed China has deployed surface-to-air missiles on Woody Island in the Paracels, and a few days later the Center for Strategic and International Studies, a Washington-based think tank with a respected program of research on the South China Sea, published satellite imagery indicating China has begun constructing radar facilities throughout the Spratlys. In a piece accompanying the images, the CSIS wrote: “New radar facilities being developed in the Spratlys could significantly change the operational landscape in the South China Sea.” China’s capabilities in the region are now greatly enhanced, as the CSIS’s interactive map shows.

China’s claims — like those of Taiwan, incidentally — to vast swathes of the South China Sea are based on the famous Nine-Dash Line, a delineation unilaterally marked on maps by both the Republic of China and its mainland successor, the People’s Republic of China, throughout the 1930s and 1940s. The international law of the sea, embodied by both historical custom and the tenets of the 1982 UN Convention on the Law of the Sea (UNCLOS), would likely, if applied strictly, vastly curb China’s entitlement to the contested territories. But the inherent conflict between the Nine-Dash Line and UNCLOS has never been properly ventilated before a neutral, internationally recognized tribunal. The Permanent Court of Arbitration’s ruling in the case between the Philippines and China promises a first entryway into clarity on the issue, which may be part of the reason why China is now moving with such urgency to assert de facto control of the islands.

What will China do next? Heightened geopolitical uncertainty throughout the South China Sea means the need to design tools to help answer this question is becoming more and more acute. Conventional intelligence can offer insight into shifts in strategic thinking and operational priorities among the Chinese military elite, as well as early confirmation of military deployments once they are already under way. But it has its limitations: Defense officials in the US and elsewhere were blindsided by the news of China’s latest radar construction in the Spratlys, which suggests smarter and better early warning tools are needed. Predata has taken a different approach — and we have discovered, in the process, that several innocuous-seeming web pages offer historically reliable clues to figure out when China will make its next move in the disputed territories.

Our broad thesis is that the metadata around user activity on carefully selected pages related to the South China Sea dispute on various open websites (Wikipedia, YouTube, Twitter and Chinese-language social media, the comments sections of news sites) can be gathered and used to compute an online volatility signal for the dispute itself; we measure volatility by compounding the amount of traffic to a particular source (i.e. page views) with the volume of the conversation (i.e. the amount of engagement demonstrated by commenters). This volatility signal can function as an index of both real-world geopolitical volatility and popular/crowd sentiment, and can in turn be manipulated, via a number of advanced data science techniques, to offer predictive insight into future actions throughout the South China Sea by the Chinese military. It’s important to distinguish between metadata and the substance of what’s actually said; our signals measure changes in activity and behavior on the internet (gauged through metadata) rather than the substance of a given discussion. We don’t scrape the internet to zero in on tiny, discrete linguistic clues the Chinese military might drop, whether by accident or design, as to the specific timing and location of future activity in the South China Sea; we use patterns of digital activity to gauge volatility and help predict the future, which is something different and more complex. We discover meaning in open source trails of online activity in circumstances where, quite often, the people leaving the trails aren’t even aware of the activity’s signaling and predictive potential.

Today we monitor, in real time, activity on almost 100 web pages related to the South China Sea dispute. We also maintain and regularly update a record of reported actions by the Chinese military in the South China Sea; today there are 41 events in that set, which dates back to 2011. (There is a separate, larger event set capturing actions by all nation-parties to the dispute; for the purposes of this piece we will restrict the discussion to the China-0nly set.) By carefully observing patterns of online activity and volatility against our South China Sea event set, we can better understand which web pages typically “spike,” via a significant increase in either views or engagement, both before and after major Chinese military moves in the region.

Digging into things further, we have discovered that two sources in particular offer particularly helpful and accurate prediction levels for historical actions by China in the South China Sea: the YouTube video of a May 2015 panel discussion on Phoenix TV, a Chinese broadcaster, entitled “Will there be war in the South China Sea?”; and the Chinese-language Wikipedia page on the Nine-Dash Line. The language in the comments and edits driving the “spikes” in digital activity on these pages is not, generally speaking, trivial or tangential to the South China Sea dispute: It is purposeful, at times violent, and bristling with nationalistic and militaristic intent.

Historically, the prediction levels of each of these sources — generated by regressing our South China Sea event set against the time series of the Predata volatility signal for the page — have been elevated, often above 50%, in the week before recorded actions in the region by China. The YouTube video averages around 100 views a day, the Wikipedia page around 50. The success rate for predictions generated off these pages is not, of course, 100%; there are notable misses as well as hits. But historically, a spike in activity on these pages has preceded activity by the Chinese military in the South China Sea with an irregular degree of success. These two web pages are, it’s important to note, just two among many that Predata monitors; other pages relevant to the South China Sea exhibit similar predictive characteristics. Indeed given the fluid nature of the conversation online and events in real life, the predictive dynamics of these pages are always in flux.

Why would it make sense for activity on selected web pages to precede action by the Chinese military? Our thesis is that collective activity online — the sum of what internet users are looking at and how they are engaging with it — can function as an index for both the wisdom of crowds and the preferences of a policy elite. Spikes in digital activity on South China Sea-related websites can reflect, we surmise, two different phenomena: an upsurge in the popular view of the likelihood of conflict in the South China Sea; and deliberate massaging, either by individual internet users acting of their own accord or as a matter of national policy, of the online discussion around the various parties’ entitlements to the disputed territories.

The Chinese government, we know, has a well-documented record of information control. This is primarily a matter of domestic policing, with restrictions on freedom of speech and unfettered discussion online within the sovereign territory of the People’s Republic. But increasingly China is eager to patrol the way its own foreign policy actions and preferences are framed and discussed, both inside and beyond the country’s borders. This matters to the policy elite in Beijing because it is important, within the context of China’s stated desire to rise “peacefully” and assume its place among the family of nations as a responsible citizen, for there to be a forceful argument made on the global internet that the country’s foreign policy actions are correct and morally unimpeachable.

China is a famously proud and spiky diplomatic actor, and its leaders are always quick to claim the moral high ground in any dispute. Foreign Minister Wang Yi’s comments to Secretary of State John Kerry last week, in which he implored the US to ignore Chinese radar construction through the South China Sea and focus instead on ship patrols and deployments by other countries, are emblematic. Wang’s statement on the Chinese foreign ministry website is a minor classic of the genre, mixing airy invocations of “ancient” history with chidingly patronizing attempts to shift the blame for tensions in the South China Sea onto other countries. This rhetorical strategy, it’s fair to assume, is not only deployed in the face-to-face realm of global diplomacy; it extends into the deepest corners of the internet.

Information control matters to Chinese policymakers most of all within the context of how Chinese citizens behind the Great Firewall perceive their country’s actions abroad: Beijing manipulates and stokes domestic nationalism to its own ends, but the government is terrified of the forces nationalism might unleash should it get out of hand, a plausible outcome in the event Chinese ambitions in the South China Sea go awry. To channel nationalism in the benign but politically useful direction desired, it is imperative for the Chinese government to control and massage the way Chinese military and foreign policy actions are discussed and presented on the Chinese-language internet. In other words, Chinese control of the South China Sea begins with Chinese control of the online discussion about the dispute in the South China Sea.

Activity on the two YouTube and Wikipedia pages mentioned above was not elevated ahead of the events of the past two weeks on Woody Island and throughout the Spratlys. But it’s important to distinguish between two different types of relevant event: 1) declared actions by China for which the government actively takes responsibility, which are externally observable and can be tied to a specific date; and 2) non-Chinese open source reports of construction activity by China, beginning at some indeterminate point in the past, which Beijing neither confirms nor denies. (Our data set, while large, is all open source; classified sources, with access to satellite photos, undoubtedly offer a more precise picture of when the Chinese begin and end actions like island reclamation or equipment installation.) Most of the actions in the historical South China Sea event set fall into the former category, while events over the past few weeks are better understood as part of the latter: The Chinese government has questioned the veracity of reports of the installation of surface-to-air missiles on Woody Island, for instance, falling back on the trusted old line that the story is a conspiracy cooked up by the western media. While few neutral experts corroborate Beijing’s skepticism, there’s naturally little dispute the actions in question began well before they were discovered and publicly reported this month.

The US has made it clear it will not acquiesce to China’s tightening grip over the South China Sea, despite the uncertainty wreaked by an operatic presidential campaign: Last week Admiral Harry Harris, head of the US Pacific Command, said the US would ignore any moves by China to declare an Air Defense Identification Zone over the South China Sea similar to the zone it declared over the East China Sea in late 2013. All signs indicate this decades-old dispute, once peripheral but now central to global security, is heading toward a reckoning — and fast.

The predictive power of different online sources shifts in response to new events. The historical success of the YouTube and Wikipedia pages discussed here is not guaranteed to replicate itself as the dispute unfolds, of course: What’s predictive today is not always what’s predictive tomorrow, however paradoxical that may sound. But the Predata platform can alert us to spikes or aberrations in volatility on countless web pages relevant to the South China Sea dispute at once. It can tell us when activity signals that have spiked before and after previous actions by the Chinese military are spiking again. And it can automatically recalculate these “best fit” signals in real time as new events occur and patterns of online user behavior shift — a powerful tool to have in the early warning arsenal as this most complex geopolitical dispute sails into uncharted, ever-choppier waters.

[email protected] | twitter: @predataofficial | www.predata.com