2012: A Breakthrough Year for Deep Learning

Bryan House
Deep Sparse
Published in
3 min readJul 17, 2019
Photo Credit: Hal Gatewood (via Unsplash)

Neural networks have been around for decades, with seminal early work pioneered by Geoffrey Hinton, Yann Lecun and others serving as major foundations for what’s happening in the field today. Larger datasets and more powerful computers that have become available within the last 10 years have made it much easier to realize the potential of deep neural nets for image recognition, translation, and other applications.

Arguably, it wasn’t until 2012 when several major breakthroughs happened for the performance and accuracy of deep neural networks. Some, like former head of Google China’s Kai Fu Lee, argue that the last great innovation for deep learning happened in 2012, and we’ve yet to see anything as significant as the University of Toronto team’s ImageNet challenge entry (more on that later.)

Regardless of where you stand on the industry’s innovation (or lack thereof), here are a few of the big breakthroughs for deep learning that happened in 2012.

Deep Neural Networks for Speech Recognition

In 2012, speech recognition was far from perfect. Typically these systems used hidden Markov models (HMMs) or Gaussian mixture models (GMMs) to identify patterns in speech. A seminal paper from Hinton et al. in 2012 showed that deep neural networks significantly outperformed these previous models.

Four different research groups (Microsoft, Google, IBM, and the University of Toronto) participated in this paper, which was significant because it was one of the first times that a neural network achieved state-of-the-art.

Google’s Brain Recognizes Cat Videos

Researchers at Google’s X lab in 2012 built a neural network made up of 16,000 computer processors with one billion connections. The computer scientists never told the algorithm during training what a cat was, but over time, it began to identify “cat-like” features until it could recognize cat videos on YouTube with a high degree of accuracy.

According to the researchers, “The network is sensitive to high-level concepts such as cat faces and human bodies. Starting with these learned features, we trained it to obtain 15.8 percent accuracy in recognizing 20,000 object categories, a leap of 70 percent relative improvement over the previous state-of-the-art [networks].”

This development was particularly significant to the field of facial and image recognition, tasks that are routinely completed by neural nets today.

The Advent of AlexNet

The aforementioned major breakthrough, the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC), was a defining moment for the use of deep neural nets for image recognition. A convolutional neural network (CNN) designed by Alex Krizhevsky, published with Ilya Sutskever and Krizhevsky’s PhD advisor Hinton, halved the existing error rate on Imagenet visual recognition to 15.3 percent. The CNN was dubbed “AlexNet.”

This University of Toronto team was the first to break 75 percent accuracy in the competition. The AlexNet paper was instrumental to the machine learning industry’s boom because it was a sort of “coming out party” for CNNs, highlighting new techniques, including the use of GPUs to train a model, and now-commonly used methods including dropout layers and rectified linear activation units (ReLU).

CNNs Reach All-Time Low Error Rates

At 2012’s IEEE Conference on Computer Vision and Pattern Recognition, researchers Dan Ciregan et al. significantly improved upon the best performance for CNNs on multiple image databases (for example, CNNs on MNIST database achieved an error rate of 0.23 in 2012).

By 2015, these MNIST error rates dropped down to 0.06656 with GoogleNet. The industry continues to see incremental improvements in error rates, and some CNNs now perform at a level that’s comparable to humans (or better, depending on the standards by which they’re being judged.)

The Subsequent Machine Learning Boom

All of these developments led to a dramatic increase in the growth of the machine learning field. For example, according to analytics firm IFI Claims Patent Services, machine learning patents grew at a 34 percent Compound Annual Growth Rate (CAGR) between 2013 and 2017, the third-fastest growing category of all patents that were granted.

I’m excited for the next great unlock for machine learning. Perhaps that breakthrough development will await us in 2019.

….

Neural Magic is powering bigger inputs, bigger models, and better predictions. The company’s software lets machine learning teams run deep learning models at GPU speeds or better on commodity CPU hardware, at a fraction of the cost. To learn more, visit www.neuralmagic.com.

--

--

Bryan House
Deep Sparse

Startups, baseball, my family and Ozzie the rescue dog. Chief Commercial Officer, Neural Magic