How SpatialOS simulations will help us uncover hidden vulnerabilities in the Internet
Read original post here.
This January Telstra, Australia’s largest telecoms provider, suffered a countrywide outage. The outage affected 16.7 million subscribers on its 3G and 4G networks and preventing phone calls; other users reported complete loss of phone and data services for over three hours. The outage not only cost the company millions in compensation (just last week, the CEO announced a day of free data) it also placed businesses who rely on mobile phone communications at risk of losing hours of trade.
These kinds of incidents aren’t rare; errors happen often, frequently at scale. Sometimes they happen by mistake. In 2008, for example, an ISP (Internet Service Provider) in Pakistan famously cut off access to YouTube for a large portion of the world by accidental misconfiguration of a central server.
Accidents happen, but more concerning are deliberate attempts to hijack traffic, preventing access to communication channels and content. At the height of the Arab Spring, the Egyptian government closed 88% of the Egyptian internet, blocking more than 3,500 routes. This ensured that no website could be accessed and that its citizens could not communicate (through email or social media, among others). Meanwhile, the largest Distributed Denial of Service attack (DDoS) occurred against a non-profit anti-spam organisation Spamhaus, affecting millions of ordinary Internet users.
What does this tell us? That these kinds of events are happening more often, and getting more severe. Businesses, governments, and ordinary people are increasingly reliant on services they get over the Internet. Imagine the implications of these kinds of errors and attacks on the Internet’s infrastructure when our cities, cars and home get connected to the Internet too. Truly understanding what the World Economic Forum once termed “the dark side of interconnectivity” is critical; but this is well beyond the reach of current analysis tools.
Simulating the entire backbone of the Internet on SpatialOS
A few weeks ago, a team of two came in from the British government to explore our technology. Their goal was to build a realistic simulation of the internet so that they could take a look at it in a new and unique way, a way that would actually show the internet’s “structure”, or in other words, the vast number of connections between computers and networks that make up the World Wide Web. With the internet under attack from a variety of sources, it’s critical they understand how to make it more resilient.
Quite an ambitious project, given they had never used our technology, SpatialOS, before and the huge scale and complexity of the system they wanted to simulate.
In their own words, “Having never developed an application on SpatialOS before, this was a tall order for a three day sprint. However, combining the flexibility of the platform with the experience and enthusiasm of Improbable’s engineers, resulted in a simulation that surpassed all initial expectations.”
The result was a 1:1 simulation of the backbone of the internet. We believe it is the largest simulation of its kind ever to have been created.
In precise terms, this is a fully dynamic and interactive simulation of all the Autonomous Systems (AS) on the internet, using a communication system closely modelled on the Border Gateway Protocol (BGP). BGP essentially determines how data can flow from one server to another. An AS usually belongs to an ISP such as BT or Comcast, but may also be owned by sites like Netflix or Youtube.
For example, when you try to load youtube.com, a request is sent to your ISP. Your ISP may not have a direct link to Youtube’s data centre, but thanks to its routing table, it will know how to get one step closer. The request is then passed along to another ISP and might go through several more locations before ending up at Youtube.
But what if one of those servers along the way is malfunctioning? Or being rerouted by someone? Perhaps it actually doesn’t know how to reach youtube.com? In that case, nobody using your ISP will be able to reach Youtube.
By letting you add and delete links in the network, you can ask "what if" questions to determine the impact of interventions. SpatialOS enables you to build large scale simulations which can be used to investigate these kinds of failures and find the routes most likely to be affected. What happens if a rogue AS starts advertising routes to everywhere but then dumps all traffic? Perhaps there are a few key routes we could protect that would mitigate the effects of such failures.
With SpatialOS, you could also use such simulations to monitor routes in the real world and detect when they deviate from what the simulation predicts, to spot errors as they happen or to detect more malicious behaviour. Considering how many people rely on the internet, the implications are enormous.
Scale: simulating thousands of connections on SpatialOS
The whole internet contains about 60,000 AS networks and over half a million routes. Since every AS has to store path information about routes to nearly every other AS, the routing tables can become unmanageably large. This has caused problems as the global routing tables of each AS grows.
In our simulation, every AS is running simultaneously and independently, so storing the full routing table for each AS requires many terabytes of RAM by itself. Doing this sort of simulation on a single server would be almost impossible; we don’t have the full list of networks and routes, but what we do have is enough to require 15 machines and over 1TB of RAM. Building a distributed system of this magnitude without SpatialOS would have been an enormous undertaking and the result wouldn’t have been as flexible or easy to maintain, and building it during a three day sprint would have been unthinkable.
In SpatialOS terms, every AS and route is represented as an entity. They each act independently with no shared data; information is shared by message passing.
There are two kinds of workers in the simulation. The BGP protocol and the network flows themselves run in the logic workers, as behaviours associated to the AS and route entities. SpatialOS runs as many logic workers as necessary to handle the workload of the simulation.
The visualisation and interaction layer is built using the Unity game engine and the SpatialOS SDK. This is an unmanaged worker that runs on an end-user machine, and serves the dual purpose of visualising the state of the simulation and letting the user influence it. Using a game engine lets us iterate quickly to build very attractive visualisations.
The full source code of this simulation is available here. You need developer access to SpatialOS to build and run it; you can apply here.
Building upon this model
But the possibilities don’t end there. This model is available for developers can integrate with their existing models today. In fact, in the past week we have begun to integrate this model with an internal project to model cities. It was as simple as importing the code and setting up the locations correctly. Being able to combine simulations in this way will enable companies and organisations to understand how the internet relates to other complex systems: cities, infrastructure, energy, economies.
With a detailed simulation of the internet we could begin to prepare for cyber attacks before they happen, and understand the cascading effects of various interventions better. This will enable businesses, institutions and even countries to become more resilient in an age of exponential vulnerability online. But more than this, it is also possible to model any kind of network on this scale, be they energy grids or telecommunications. You could, for example, model a national power infrastructure network to comprehend not just its potential vulnerabilities but the impact it could have on a new transport network or the economy.
If you would like early access to SpatialOS to start creating your own simulations, or models of multiple connected networks, you can sign up here.