Yet another reading on “History of Neural Nets”

Realizing my ignorance while reading Fortune article, I decided to further read about neural nets and found a gem: “A ‘Brief’ History of Neural Nets and Deep Learning” by Andrey Kurenkov.

The predominant thought [in the early days of AI] was that making computers able to perform formal logical reasoning would essentially solve AI...This [early thought] lacked a mechanism for learning, which was crucial for it to be usable for AI.

Rosenblatt came up with a way to make such artificial neurons learn by having weights on the inputs allowed for a very simple and intuitive learning scheme. Rosenblatt thought that networks of such simple computational units could be vastly more powerful and solve the hard problems of AI.

Many other researchers were focusing on approaches based on manipulation of symbols with concrete rules that followed from the mathematical laws of logic. Marvin Minsky's skeptical analysis suggested that he concluded this approach to AI was a dead end. AI Winters followed.

While Minsky argued that Perceptrons had to be done with multiple layers, his learning algorithm did not work for multiple layers.

Backpropagation approach was popularized in the late to understand how multilayer neural nets could be trained to tackle complex learning problems. And, so neural nets were back. 

In 1989, Yann LeCun at AT&T Bell Labs demonstrated a very significant real-world application of backpropagation in "Handwritten Zip Code Recognition".

The first hidden layer of the neural net was convolutional.

In the mid 1990, new machine learning methods had begun to also emerge, and people were again beginning to be skeptical of neural nets since they seemed so intuition-based and computers were still barely able to meet their computational needs.

Dubbed TD-Gammon, the neural net was trained using a standard reinforcement learning algorithm and was one of the first demonstrations of reinforcement learning being able to outperform humans on relatively complicated tasks. However, the neural net was still far worse than a standard computer program (GNU-Chess) implemented long before.

Typical deep NNs suffer from the now famous problem of vanishing or exploding gradients. So, around the mid 90s, a new AI Winter for neural nets began to emerge. A new method called Support Vector Machines, which in the very simplest terms could be described as a mathematically optimal way of training an equivalent to a two layer neural net, was developed and started to be seen as superior to the difficult to work with neural nets. Other new methods, notably Random Forests, also proved to be very effective and with lovely mathematical theory behind them.

As Hinton tells it, they hatched a conspiracy: "rebrand" the frowned-upon field of neural nets with the moniker "Deep Learning". But, more important than the name was the idea - that neural networks with many layers really could be trained well, if the weights are initialized in a clever way rather than randomly.

"Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition" showed a whopping %0.33 error rate could be achieved on the MNIST dataset without anything more special than really big neural nets, a lot of variations on the input, and efficient GPU implementations of backpropagation. These ideas had existed for decades, so although it could not be said that algorithmic advancements did not matter, this result did strongly support the notion that the brute force approach of big training sets and fast parallelized computations were also crucial.

The two interns handily proved the power of deep learning during their three month internship, and Microsoft Research has been at the forefront of deep learning speech recognition ever since.

Still, having all these research discoveries since 2006 is not what made the computer vision or other research communities again respect neural nets. What did do it was something somewhat less noble: completely destroying non-deep learning methods on a modern competitive benchmark.