An AI Game of Thrones

Published in

The Tokyo Banana

8 min readJun 18, 2020

Winter is coming for artificial intelligence, and it may be years before AI research picks up again.

Fei-Fei Li (Stanford University) and Geoffrey Hinton (University of Toronto)

For anyone who’s not a data scientist, Baymax in Disney’s Big Hero 6 may seem like machine learning’s holy grail. But a truly human-like, intelligent algorithm is so far-fetched that most data scientists shrug off its possibility entirely.

Since Big Hero 6 was released in 2014, data science as a field hasn’t changed. Forget intelligent robots — our best learning models struggle to match the intelligence of a cockroach. Today’s artificial intelligence is relegated to identifying niche patterns for well-defined mathematical applications, like product pricing.

The hype surrounding AI is waning. Industry leaders are replacing “artificial intelligence” with “augmented intelligence” in the face of an anticipated imminent halt in academic interest in machine learning. The AI winter is coming.

So how do we prepare for it? Can we even avoid it?

The AI field is plagued by irrational optimism and irrational despair.

In 1973, Sir James Lighthill was asked to compile a report on the then-present state of artificial intelligence. His report criticized the hype surrounding artificial intelligence research, suggesting that AI’s best algorithms would always fail at solving real world problems and could really only work for solving “baby” problems.

His report followed almost twenty-five years of fervent research into human-like algorithms. The AI “summer” between the 1950s and 1970s saw DARPA investing millions into undirected research that touched on natural language processing. The air force dumped gargantuan resources onto scientists who were developing symbolic reasoning. English universities raced to create the first general purpose robot that could adapt to any situation, learn anything, and interact with people like a human would.

The economic promise surrounding general purpose algorithms was free labor. This dream was a perfect “computer human” that took care of itself, didn’t need sustenance and worked in any situation. Once cracked, the productivity gains from a human-intelligence AI would instantly make the space race look like child’s play.

Sir James’s “Lighthill Report” accused the general purpose robot of “being a mirage.” Lighthill debated premier AI researchers in the BBC broadcast “Controversy,” ultimately defeating pro-AI-research sentiment by making one simple observation: the complexity of the AI algorithms was far from the complexity of anything like a human brain. He argued that we’d need decades of transistor miniaturization, increased memory capacity, and hardware development before the question of AI could even be raised.

Thus began the first AI winter as universities and governments alike pulled their funding overnight. Public support for AI disappeared as well —after all, weren’t they a blatant waste of taxpayer dollars?

This phenomenon of an idea’s rise and fall extends to any novelty, including bitcoin, fine art, unique college programs and space exploration (look up the Gartner Hype Cycle). But AI is unique in that its development spans several decades rather than years. Careers can begin and end within a single AI winter, never seeing the light of summer.

“Changing Seasons” by Jonathan Grudin. Note that HCI stands for human-computer interface.

AI summers are always triggered by a fundamental algorithmic breakthrough like Warren McCulloch’s and Walter Pitts’s neural network in 1943. AI winters are always triggered by a fundamental hardware limitation: too few transistors in the 1970s and too little memory in the 1990s.

Fast forward to today. By spec, our supercomputers are on par with human brains. The transistor counts are much greater, the memory amount is larger, and we even know how individual neurons interact with one another in limited capacity. But over the course of the last decade, our algorithms haven’t improved more than a couple percentage points in very niche applications.

Without a clear hardware limitation, how is machine learning failing to progress in general purpose intelligence? If the upcoming AI winter has little to do with a hardware limitation, something else must be triggering it.

Fei-Fei Li, a computer science professor at Princeton, published ImageNet in 2009. By using Amazon’s Mechanical Turk, Dr. Li outsourced the task of labeling millions of pictures to online freelancers (often paid pennies to do so). The aim was to improve the volume of high-quality labelled data available to classification algorithms. The result was a database comprising 3.2 million labelled images, sorted into 12 “branches” and 5000 subcategories.

ImageNet kicked off an annual classification competition where entrants would submit their own algorithms, train them on ImageNet, and compete based on their accuracy. This competition is often credited for generating the hype around the current AI summer; winners were consistently gobbled up by Fortune 500 companies hoping to capitalize on AI. The last competition in 2017 featured a winning accuracy of 97.3%, meaning that the AI correctly identified and categorized images in the database 97.3% of the time.

But it wasn’t just ImageNet that triggered the current AI summer we’re in — it was an idea that came from that original 2009 paper:

It’s not about the algorithm. It’s about the data.

Every machine learning algorithm performs differently with a small training dataset, like a couple hundred or thousand data points. If I pitted logistic regression against a convolutional neural network in identifying cats and dogs, the neural network would perform significantly better.

But when I increase the size of the training dataset, the accuracy difference becomes smaller. In a lot of cases, as the number of training data points increases, the algorithmic performances all approach similar plateaus:

Some sample algorithms discussed by Ruoxi Jia and others.

Why improve the underlying algorithm when you can simply throw more data at it?

Thusly, machine learning skyrocketed to new heights, achieving classification accuracies that were previously unheard of. Training databases ballooned in size as previously impractical tasks like recognizing handwriting became trivial and were packaged into everyday products.

But people soon recognized a fundamental flaw with this framework: the algorithms were training on larger and larger datasets often had incapabilities that prevented their accuracies from increasing beyond a certain limit.

Consider a picture of a cat. Since ImageNet was first published, algorithms could identify cats really well — as long as the training data accurately represented the cat in three dimensions from every angle. The best neural network for this classification — the convolutional neural network — can’t create a 3D representation of the image it’s trained on. So the second you showed the algorithm a picture of a cat from an angle that the algorithm has not seen, the algorithm fails (this issue is called “pose estimation”).

Fundamental flaws like these are scattered everywhere in artificial intelligence. Flaws like these were uncovered in the development of IBM Watson, which led to its ultimate demise as a general purpose AI. Without any kind of algorithm development to remove these flaws, it’s easy to see how we’re on the cusp of an AI winter.

Enter Geoffrey Hinton, a computer science professor at the University of Toronto. Dr. Hinton has focused on algorithmic development since 1978, was the first to apply the backpropagation algorithm to deep neural networks, and solved the pose estimation problem by introducing an entirely new “capsule” neural network.

Instead of increasing the size of the dataset, Dr. Hinton focuses on specific algorithm-level issues and solves them methodically by tweaking the algorithm’s core architecture. He developed AlexNet, the pioneering neural network that destroyed the competition in ImageNet 2012.

Capsule network performance versus prior art, from Ruilong Chen and others. Night and day difference.

Dr Hinton’s focus away from increasing pure data volume is, for the time being, an uncommon one, likely because successful algorithmic development requires a rare and deep understanding of how the machine learning algorithms work. The focus defies the “marginal improvement” paradigm that data scientists presently believe, since the work put into improving an algorithm may not reliably translate into results. In other words, while adding data will always yield some kind of (albeit diminishing) benefit, you could develop a hundred new algorithms entirely before stumbling across one that works.

But the rewards for better algorithms often dwarf anything that larger datasets can provide. The development of deep neural networks, for instance, introduced the very idea of chaotic predictions and high accuracy object recognition.

In the literature, the duel between data volume and algorithm development can be seen by the sheer volume of papers published on the subject. And data volume is winning. For every paper on algorithm improvement, there are hundreds of papers dealing with increasing training data volume.

Venture capitalists, entrepreneurs and professors alike pounce on the latest datasets; companies are making their own internal datasets to edge out the competition; there are even entire Mechanical Turk communities whose sole job is to painstakingly scour the internet, label image data and sell those datasets for profit. In machine learning, data is king and has been since 2009.

And so the AI game of thrones continues. On the one hand, scientists in Fei Fei Li’s camp are pushing for better and better quality data to gradually boost performance. On the other hand, scientists in Geoffrey Hinton’s camp are developing new algorithms entirely to “leapfrog” over the entire field by solving a fundamental flaw in current machine learning models. (The former camp feels like the Baratheons as they hold the throne, and the latter seems like the Starks — they’re waiting for winter.)

Clearly, the best solution is a combination of the two. But it likely errs on the side of algorithm development. There’s only so much that larger data volumes can achieve, and training on millions of photos is certainly not how humans learn. If you show me just a few pictures of a mancun, for example, chances are that I’ll be pretty good at identifying them afterward.

Dr. Hinton seems to believe that the AI winter is inevitable. In an informal meeting with Dr. Nick Bostrom (the “Universe Simulation Hypothesis” guy), he’s quoted as stating that he didn’t expect general AI to be developed any sooner than 2070. If that’s true, then this upcoming AI winter may be the longest on record as scientists edge toward the “master algorithm” that will eventually lead to sentient AI like Baymax.

To have meaningful advances in AI, I think we need to pursue heavy algorithm development. While no one is questioning the need for good data, the benefit from increased data volume has had its day. It’s time to set aside our attachment to the current art, and work on novel methods that learn in new ways. It’s hard work, but well worth it for the promises of general purpose AI.

And because winter is coming.

An AI Game of Thrones

Written by Vikrant Sharma