Back to the Edge: AI Will Force Distributed Intelligence Everywhere
In part one we explored how artificial intelligence was ramping up the demand for computing cycles.
In this installment we’ll explore how the demands of AI will drive two shifts: the resurgence of the processing on the edge, and the arrival of new processing architectures.
Produced in partnership with NewCo Shift.
Cloud will flourish, edge will bloom
In late 1996, George Favaloro and Sean O’Sullivan, two executives from Compaq, realized that the presence of ubiquitous Internet connections to computers would change where information processing could take place. Rather than occurring in office server rooms or on the desktop, computing processes could start to take place to servers accessed over the Internet. This shift in the locus of computation they called the ‘cloud’. The term didn’t stick (back then, anyway), and Compaq was swallowed by HP in 2002.
But of course, the theme they identified took root. In 2006 Google’s then-chief executive, Eric Schmidt, said: “I don’t think people have really understood how big this opportunity really is. It starts with the premise that the data services and architecture should be on servers. We call it cloud computing — they should be in a “cloud” somewhere.”
Since then cloud computing has become one of the fastest growing areas of enterprise IT. Amazon’s AWS business unit, formed just eleven years ago, has surpassed $12 billion in revenues. Cloud computing unleashed an entirely new business model for software: the software-as-a-service model which has created such giants as Salesforce, Workday and Zendesk.
Today much enterprise compute happens in distant data centers of private or virtual private clouds. And enterprise compute is, of course, what matters right now. We have smart devices, like our smartphones, tables and powerful laptops on the edge of the network. Yet the bulk of smart stuff that we experience as consumers — ranking content in Facebook, recommending a product on Amazon, responding to our spoken commands in Siri — happens in the cloud on infrastructure owned by Internet behemoths.
This shift to the cloud isn’t the first time the locus on computation has shifted. The move from mainframe and mini-computing to the desktop personal computer had taken processor cycles from closely guarded computer rooms onto the individual desks of millions of users. It was an edge-ification of compute, if you like. This shift began in the mid-1970s and accelerated through the 80s as the DOS and IBM PC standard took hold.
The arrival of the Internet enabled us to move computing back to the cloud, away from the edges of the network and the user’s desktop. But the impending tsunami of artificial intelligence will see a growth in — and shift of computing power — from the cloud back to the edge of the network.
Edgy Progress. Why?
Ultimately, you need to deliver intelligence where it is needed. As everything becomes intelligent, from light bulbs, to cameras, to simple sensors, they will need their own local ability to devise what to do and ultimately learn from their local environment.
To grossly simplify, an intelligent device needs to do two things. The first is to learn (or be trained) about its environment. The second is to figure out (or infer) what to do at any given stage.
Today’s AI systems rarely either of these things locally on the device in question. In most cases, an AI system (like Amazon Echo) sends data back to the cloud servers to process and ‘think.’ Your Echo responds like a parrot, repeating the insight calculated for it by some server in a data center across the globe.
In addition, these devices do not do any learning from their environment or experience. The learning happens back at the cloud. Currently, they get smarter via occasional updates from some cloud-based artificial intelligence.
The reason why intelligence needs to be localized is manifold. One reason is latency. If a decision is required quickly, the network delays of sending data back to the cloud, most likely over a mobile network, and waiting for a response, might be deleterious to the whole process.
The latency for a device in Europe on a 4G network connecting to a data centre might be 50ms. And if significant amounts of data need to be sent (such as a clip from a video stream), the total transmission time might be measured in seconds. Even if the machine vision process takes only milliseconds, the whole end-to-end transaction could run to more than a second or two.
For many applications that delay will be unacceptable. No transportation modality, like self-driving cars or drones, can yet rely on cloud inference. Cars move too fast and pedestrians are too squishy. Robo-surgeons can’t be that forgiving either. A patient’s metabolic murmurations won’t wait for an IP packet to traverse the Atlantic to Amazon Web Services’ servers. And consumers will get increasingly frustrated if their smart doorbells, intelligent nail clippers, or mindful microwaves suffer a 1–2 second delay before responding to simple requests.
A second reason is that devices will soon need to be powerful enough to learn at the edge of the network. Why? Because the devices will be used in situ and those locales will be increasingly contextualized. The environment where the device is placed will be a key input to its operation. I know this sounds odd today when we think of comparatively dumb devices but its in the essence of autonomy that a device needs to learn and adapt to its local surroundings. We expect our autonomous vehicles to deal with the roads they are on in Paris, Portland or Pune, right now, as they are being driven and not on the basis of some Platonic-ideal learnt on the wide boulevards of Palo Alto. And what is true for autonomous vehicles will be true for autonomous daycare-bots, autonomous vacuums and autonomous book-keepers.
The bulk of today’s AI systems don’t have this level of intelligence. Even if they can infer behavior on the device, they seldom learn about their surroundings on the device. Most learning happens back in the cloud.
Yes, inference (that is getting the AI to predict what to do next given the current state of the environment) is an obvious candidate to push to the edge. But it will also become essential to push model training (the ‘learning’) to the edge.
Trainer on the edge
The training stage for artificial intelligence has traditionally required substantially more processing power than the inference or prediction stage. Any parents knows this. It is much harder to teach a child how to open a door (the ‘training stage’) than it is for them to open a door once they know how to (the ‘inference stage’).
For instance, in 2011 Google Brain was trained to recognize cats and people by watching YouTube videos — it ran on 2,000 CPUs at one of Google’s data centers. Many developers train models back in the cloud, where there can be rack-upon-racks of GPUs, then push less computationally expensive inference models down to cheaper devices on the edge.
for smarter models, lower latency, and less power consumption, all while ensuring privacy. And this approach has another immediate benefit: in addition to providing an update to the shared model, the improved model on your phone can also be used immediately, powering experiences personalized by the way you use your phone.
An example of where Google is applying these smarts is with typing prediction on their GBoard Android keyboard app. All users will experience improved prediction based on the behavior of all other users.
What we are likely to see is a multi-faceted infrastructure where learning happens on the edge of the network and in the cloud. Such federated network learning would be efficient. It would allow the network to learn from the experience of thousands (or millions or even billions) of edge devices and their experiences of the environment. But rather than laboriously and pedantically sending their raw experiential data back to the cloud for analysis, in a federated environment, the edge devices could do some learning and efficiency send back deltas (or weights) to the cloud where a central model could be more efficiently updated.
Such mechanisms could be embedded with lossy approaches like differential privacy ensuring that the aggregate data in a database reveals no information about particular individuals or their habits. This still allows for capturing significant patterns in the data, while guarding individual privacy.
The shifting locus of intelligence from cloud to cloud-and-edge will have other ramifications as well, including the types of chips which must be manufactured.
New species of silicon and beyond
Intel’s generalized microprocessor, the central processing unit (or “CPU”) has held sway over the technology industry since the Intel 4004 was introduced in 1971. Intel, the general CPU, and Moore’s Law have been a power trifecta for decades. The generalized CPU has served us very well. It allowed the emergence of a standard operating system (first DOS, then Windows) which created a common platform, which in turn lowered the costs for developers and users of IT. (It also created the Wintel monopoly, which was only shattered not by anti-trust but by the arrival of the Internet, in the first instance, and the mobile phone in the second.)
However, for the differentiated demands of the machine learning, it turns out that the CPU is too general. Worse, the limits of manufacturing and quantum physics have brought Moore’s Law improvements to a standstill. The august scientific journal Nature led with this on their cover in 2016. Much of the industry is in agreement.
In fact, the current boom in AI investment was catalyzed by a switch from the general purpose CPU that had given us three decades of the Wintel flywheel. It was triggered back in 2012 by a neural-network running on a pair of graphic processing units (GPUs, of which more below) rather than a set of CPUs.
After researcher Alex Krizhevsky and colleagues won the 2012 Imagenet contest handsomely, the advantages of deep neural networks paired with GPUs became manifest. The rest is recent history.
If the previous 30 years of computing created value on the general CPU, the next decades will create value on a more complex ecology of processing architectures. Several of these are starting to emerge: Nvidia’s GPUs and its CUDA ecosystem; Google’s TPU chips and Tensorflow control software; AI-based FPGAs (as found in Microsoft’s Azure cloud); new neuromorphic chips; and rapidly approaching quantum computing.
From video game to matrix maths
A core requirement of deep learning is the need to perform large numbers (often billions) of large scale calculations. Modern machine learning approaches, fundamentally neural nets, represent data structures in what are called tensors.
Understanding tensors is not straightforward unless you’ve taken an serious maths course or watched this excellent 12 minute YouTube video. A tensor is a mathematical structure, a grown-up multidimensional version of a matrix. And a matrix, simply put is a table (like a spreadsheet) of rows and columns with values in it.
Commons approaches to neural nets use matrices to represent the higher-dimensional data. And in a basic machine learning model for a simple computer vision application might require multiplying matrices with hundreds or thousands or rows and columns together, and doing that many, many times. Resulting in tens of millions or billions of calculations.
These types of calculations are commonplace in video games, and it was amongst hardcore PC-gamers that GPUs initially got their foothold. These GPUs had dedicated pipelines with thousands of processors optimized for this matrix maths. When it comes to matrix calculations, they leave a general CPU in the dust.
As an amusing side note, cryptocurrency miners also love GPUs. In China, where the majority of the world’s bitcoin is cryptographically mined, demand for GPU cards has left keen gamers short of stock. (And when the price of bitcoin dropped, miners rapidly put their GPU cards for sale on eBay.)
Accelerating tensors & networks
Of course, if the underlying data structures and calculations on neural nets take places via tensors, why not optimize the processors to handle tensor calculations in silicon-directly rather that transform it into matrix representations?
That is what Google has done. Its own significant processing demands have led to developing its own custom silicon chip, the tensor processing unit. The TPU was designed specifically to speed up deep learning operations in Google data centers. These algorithms are used across many Google services, including Google photos and voice recognition on its Android phones. These TPUs run 15–30 times faster than a traditional chip. By one estimate, the use of TPUs has allowed Google to meet the rising demands of deep learning and saved the firm from building a dozen data centers. (Many more technical details of the TPU can be found on this blogpost from the Google team.)
Google has been dog fooding its TPUs. Soon any developer will be able to access TPUs (through the control framework of Googles TensorFlow) on Google’s cloud computing platform, Google Cloud. Initially, this will afford developers 180 Teraflops of processing power optimized for deep learning. (This program is currently in closed testing.)
This starts to look a bit like a typical cloud-software ecosystem. Google TPUs as the hardware, Google TensorFlow as the control software, and Google Cloud as the capacity-provider and billing interface. For now, TPUs reside in the data centre and have yet to make it out to edge devices.
Google isn’t alone in coming to the conclusion that chips can be optimized for tensor-calculation. A UK startup Graphcore has come to a similar conclusion. Graphcore, which has raised more than $60m this year alone, is developing its own Intelligent Processing Units, essentially a tensor processing unit by another name. These are chips optimized for processing network graphs. (A network graph is just a different way of representing the high-dimensional tensors that encode neural nets.)
Graphcore promises that its dedicated silicon will have more than 1,000 independent processors on a single chip, which significantly exceeds Nvidia’s top of the line GPUs and promises to beat Google’s second-generation TPUs. (For a very clear and slightly technical introduction to Graphcore’s approach, I recommend this thirty-minute presentation from Graphcore’s informed CTO, Simon Knowles.)
Other major firms are following suit. Microsoft has announced dedicated silicon hardware to accelerate deep-learning in its Azure cloud. And in July, the firm also revealed that its augmented reality headset, the Hololens, will have a customized chip in it to optimize machine learning applications.
Apple has a long track-record of designing its own silicon for specialist requirements. Earlier this year Apple ended a relationship with Imagination Technologies, a firm that has been providing designs for GPUs in iPhones, in favor of its own GPU designs. With the release of CoreML, a set of software tools to make machine learning easier on the iPhone, we can expect optimized chips to follow. Intel, whose dominant position with CPUs is threatened, has responded with acquisitions (such as Altera, Nervana Systems and Movidius) as well as R&D in new non-traditional architectures.
As we argued in part one, artificial intelligence will create substantial demands for computing. Just one use case, powering autonomous vehicles, equates to the computational demands of several iPhone industries. But that is just one use case. These demands will require better-than-Moore’s Law improvement. Much of that extra demand will be met by innovations in chip design and architecture and a progressive diminution of the general CPU that drove much of the industry’s progress since the 1970s.
That alone, won’t be enough. Much of the processing that drives out computing experiences today happens in the ‘cloud’. As AI applications becomes more ubiquitous we will need to shift some of the intelligence, both predicting and learning, close to where it is needed. This will result in a relative increase in the amount of intelligence at the edge of the network.
In part three, we’ll look at what opportunities this provides for investors and entrepreneurs.
I curate a weekly newsletter about the impact of exponential technologies on businesses, society and polity. Join more than 20,061 happy readers.