Fascinating Tales of a Strange Tomorrow

AI: Science vs. Fiction

Our trip begins in March 1956 with the release of the “Forbidden Planet” movie, which featured Robbie the Robot, commonly acknowledged as the first Science Fiction robot on screen. A few months later, a small group of Computer Scientists led by John McCarthy[1] held a 6-week workshop[2] at Dartmouth College in New Hampshire.

John McCarthy (Turing Award 1971) & Robbie the Robot

The topic of this workshop was “Artificial Intelligence”, a term coined by McCarthy himself, which he defined this way:

“Every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it”.

Wouldn’t it be great if the Dartmouth workshop had actually be triggered by John McCarthy seeing the “Forbidden Planet” and then going home thinking: “Let’s build Robbie”? That’s probably not true at all, though. Oh well.

Anyway, the group got to work and laid the foundations of Artificial Intelligence as we know it. In fact, most of the participants devoted their entire career to furthering the state of the art on AI, receiving no less than four Turing Awards in the process: Marvin Minsky[3] in 1969, John McCarthy in 1971, Herbert Simon[4] & Allen Newell[5] in 1975.

Herbert Simon (Turing Award 1975, Nobel Prize in Economics 1978) ad Allen Newell (Turing Award 1975)

During the early years of AI, these bright scientists made predictions, such as:

· 1958, Herbert Simon and Allen Newell: “Within 10 years a digital computer will be the world’s chess champion.

· 1965, Herbert Simon: “Machines will be capable, within 20 years, of doing any work a man can do.

· 1967 Marvin Minsky: “Within a generation, the problem of creating ‘artificial intelligence’ will substantially be solved.

· 1970 Marvin Minsky: “In from 3 to 8 years, we will have a machine with the general intelligence of an average human being.

Oops.

The AI Winter is Coming

Predicting the future is always risky business, but still… This raises an daunting question: how could such brilliant minds be so awfully wrong about what AI would (or wouldn’t) achieve in a reasonable time frame? Don’t worry, we’ll answer this question later on.

Unfortunately, repeated failures to achieve significant progress became a trademark of Artificial Intelligence.

Expectations were high, little or no results were delivered, funds were cut and projects were abandoned. Unsurprisingly, these multiple “AI winters” discouraged all but the most hardcore supporters.

The most glaring symbol of this disillusion came from Marvin Minsky himself. In 2001, he gave a talk named “It’s 2001: Where is HAL?” referring of course to the HAL computer in Stanley Kubrick’s movie “2001: A Space Odyssey”. This is all the more significant that back in 1968, Minsky actually advised Kubrick during the making of the movie. In this talk, he notably addresses the “Common Sense issue” in non-ambiguous terms: “No program today can distinguish a dog from a cat, or recognize objects in typical rooms, or answer questions that 4-year-olds can!”

Marvin Minsky (Turing Award 1969) & HAL 9000

Bottom line: AI is cool to play with in a lab environment, but it will never achieve anything in the real world. Case closed.

Meanwhile, on the US West Coast…

While AI researchers despaired in their labs, a number of startups were reinventing the world: Amazon, Google, Yahoo, later joined by Facebook and a few others were growing their web platforms at a frantic pace. In the process, they were acquiring users by the millions and piling up mountains of data. It soon became clear that this data was a goldmine, if it could actually be mined!

Using commodity hardware, these companies’ engineers set on a quest to design and build data processing platforms that would allow them to crunch raw data and extract business value that could turn into revenue… always a key goal for fast-growing startups!

A major milestone was reached in December 2004, when Google released the famous Map Reduce paper[6], where they described « a programming model and an associated implementation for processing and generating large data sets ». Not to be outdone, Yahoo implemented the ideas described in this paper and released a first version of their project in April 2006: Hadoop[7] was born.

Gasoline waiting for a match: the Machine Learning explosion happened and the rest, as they say, is history.

Fast-forward a few years

2010 or so: Machine Learning is now a commodity. Customers have a wide range of options, from DIY to Machine Learning as a Service. Everything is great in Data World. But is it really? Yes, Machine Learning helped us make a lot of applications “smarter” but did we make significant progress on Artificial Intelligence? In other words, are we any closer to “building HAL”? Well… no. Let’s try to understand why.

One of the first steps in building a Machine Learning application is called “feature extraction”. In a nutshell, this is a step where Data Scientists explore the data set to figure out which variables are meaningful in predicting or classifying data and which aren’t. Although this is still mostly a lengthy manual process, it’s now well understood and works nicely on structured or semi-structured data such as web logs or sales data.

However, it doesn’t work for complex AI problems such as computer vision or computer speech, simply because it’s quite impossible to define formally what the features are: for example, what makes a cat a cat? And how is a cat different from a dog? Or from a lion?

To put it simply, traditional Machine Learning doesn’t solve this kind of problem, which is why new tools are needed. Enter neural networks!

Back To The Future

New tools? Hardly! In 1957, Frank Rosenblatt designed an electro-mechanical neural network, the Perceptron[8], which he trained to recognize images (20x20 “pixels”). In 1975, Paul Werbos published a article describing “backpropagation”[9], an algorithm allowing better and faster training of neural networks.

So, if neural networks have been around for so long, surely they must be partly responsible for failed AI attempts, right? Should they really be resurrected? Why would they suddenly be successful?

Very valid questions indeed. Let’s first take a quick look at how neural networks work. A neuron is a simple construct, which sums multiple weighted inputs to produce an output. Neurons are organized in layers, where the output of each neuron in layer ’n’ serves as an input to each neuron in layer ‘n+1’. The first layer is called the input layer and is fed with the input data, say the pixel values of an image. The last layer is called the output layer and produces the result, say a category number for the image (“this is a dog”).

The basic structure of a neural network (Source: “Deep Learning”, Goodfellow & Bengio, 2016)
The beauty of neural networks is that they’re able to self-organize: given a large enough data set (say, images as inputs and category labels as outputs), a neural network is able to learn automatically how to produce correct answers

Thanks to an iterative training process, it’s able to discover the features which allow images to be categorized, and adjusts weights repeatedly to reach the best result, i.e. the one with the smallest error rate.

The training phase and its automatic feature discovery are well adapted to solving informal problems, but here’s the catch: they involve a lot of math operations which tend to grow exponentially as data size increases (think high-resolution pictures) and as the number of layers increases. This problem is called the “Curse of Dimensionality” and it’s one of the major reasons why neural networks stagnated for decades: there was simply not enough computing power available to run them at scale.

Nor was enough data available. Neural networks need a lot of data to learn properly. The more data, the better! Until recently, it was simply not possible to gather and store vast amounts of digital data. Do you remember punch cards or floppy disks?

A significant breakthrough happened in 1998 when Yann Le Cun invented Convolutional Neural Networks[10], a new breed of multi-layered networks (hence the term “Deep Learning”).

In a nutshell, CNNs are able to extract features efficiently while reducing the size of input data: this allows smaller networks to be used for classification, which dramatically reduces the computing cost of network training.

This approach was so successful that banks adopted CNN-driven systems to automate handwriting recognition for checks. This was an encouraging accomplishment for neural networks… but the best was still to come!

Architecture of a Convolutional Neural Network (Source: NVIDIA blog)

The Neural Empire Strikes Back

By the late 2000s, three quasi-simultaneous events made large-scale neural networks possible.

First, large data sets became widely available. Text, pictures, movies, music: everything was suddenly digital and could be used to train neural networks. Today, the ImageNet[11] database holds over 14 million labeled images and researchers worldwide use it to compete every year[12] in build the most successful image detection and classification network (more on this later).

Then, researchers were able to leverage the spectacular parallel processing power of Graphics Processing Units (GPUs) to train large neural networks. Can you believe that the ones that won the 2015 and 2016 ImageNet competition have respectively 152 and 269 layers?

Last but not least, Cloud computing brought elasticity and scalability to developers and researchers, allowing them to use as much infrastructure as needed for training… without having to build, run or pay for it long term.

The combination of these three factors helped neural networks deliver on their 60-year old promise.

State of the art networks are now able to classify images faster and more accurately than any human (less than 3% error vs. 5% for humans). Devices like the Amazon Echo understand natural language and speak back at us. Autonomous cars are becoming a reality. And the list of AI applications grows every day.

Wouldn’t you like to add yours?

Number of layers and error rate of ILSVRC winners

How AWS can help you build Deep Learning applications

AWS provides everything you need to start building Deep Learning applications:

· A wide range of Amazon EC2 instances to build and train your models, with your choice of CPU[13], GPU[14] [15] or even FPGA[16].

· The Deep Learning Amazon Machine Image[17], a collection of pre-installed tools and libraries: mxnet[18] (which AWS officially supports), Theano, Caffe, TensorFlow, Torch, Anaconda and more.

· High-level AI services[19] for image recognition (Amazon Rekognition), speech to text (Amazon Polly) and chatbots (Amazon Lex).

The choice is yours, just get started and help Science catch up with Fiction!

A New Hope?

Artificial Intelligence is making progress every day. One can only wonder what is coming next!

Will machines learn how to understand humans — not the other way around?

Will they help humans understand each other?

Will they end up ruling the world?

Who knows?

Whatever happens, these will be fascinating tales of a strange tomorrow.

Note: this is an edited transcript of one of my current keynote talks. Original slides are available here.