Limits of Deep Learning

Yiğit Şimşek
aiforexistence
Published in
4 min readMay 1, 2021

TL;DR Present-day Deep Learning models are scaling their computational requirements much faster than the growth rate of computing resources. They rely on huge sets of parameters that make them much greater tools compared to older methods. To beat this challenge, we may need to have new perspectives on our architectures, possibly on a fundamental level, to make them smaller scale but still high performance. On the other hand, we may develop new types of hardware that will be able to keep up with the requirements of DL architectures.

The dependence on Deep Learning models has been rising immensely for the past decade, due to their revolutionary performance on various problems in different domains. The performance gap between DL algorithms and older ML algorithms is very prominent in areas such as Image Classification, Object Detection, and Natural Language Processing. To give an example, one of the commonly accepted turning points in the “revolution of Deep Learning” is the ImageNet competition of 2012. In the 2012 competition, AlexNet was the only architecture to get an error rate below 25% and it was the only deep neural network architecture in the competition! Subsequent competitions hosted in the following years showed that all of the high-performance competitors preferred Deep Learning architectures. Furthermore, the error rates were also dropping significantly, as low as 2%, in the following years.

We owe this great boost in performance to the differentiating features of Deep Learning:

1- The performance of DL models increases with the size of data.

2- DL models require a lot of computation power.

One of the main advantages regarding the performance of Deep Learning models is that they can be used to learn huge amounts of data. This makes up a distinctive aspect of Deep Learning, which sets it apart from other Machine Learning algorithms that we had been using before the emergence of Deep Neural Networks.

We see that the performance of Deep Learning models scale wonderfully with the increasing size of the datasets that we use to train them. However, we know that performance of other Machine Learning algorithms does not benefit from this increase in the size of data as much as Deep Learning models do. We can say that this is a common opinion popularly attributed to Andrew Ng.

Another reason which explains the better performance of Deep Learning models compared to the older ML models is that DL models rely on huge sets of computations composed of weights and parameters of a model. In an Artificial Neural Network, the input values we pass on to the input layers are multiplied by weights and summed with biases in each neuron. These values for every neuron in an ANN are initially given random values and adjusted towards correct output values determined from our input-output couples that we use in our training dataset.

In a research paper dated 2020, “The Computation Limits of Deep Learning”, it is stated that “It has been proven that there are significant benefits to having a neural network contain more parameters than there are data points available to train it, that is, overparameterizing it.” A good example would be NoisyStudent. It is said to have 480 million parameters with top-1 accuracy on ImageNet, while ImageNet having 1.2M data points. Over-parameterization is a resource-expensive method to achieve high performance on deep learning architectures. The main issue is we have to have more parameters than we have data points, and because the number of computations grows by the product of parameters and data points, we can conclude that the computational complexity of over-parameterized DL models grows at least by the square of data points. According to Statistical Learning Theory, the paper states that to have an improvement in performance regarding the error rates, we must have a quadratic improvement in the number of data points. So we can say that complexity of computation power required to have a linear growth in performance increases at least performance⁴, that is, O(performance⁴).

Accordingly, the burden of calculating the weights and biases of a deep neural network, is what proves to be the main reason the deep learning models are might be approaching their limits. It was known for a very long time that neural networks were computationally expensive. Yet, the progression of the hardware enabled us to utilize such power-consuming architectures. The progression of CPUs was good enough to keep our models to smaller-scale ones. Our DL models were able to be larger-scale with the coming of GPUs. The computations sped up to 35x, however, the growth speeds of architectures were even greater than the growth rate of computation power of GPUs. This may be one of the major problems in our present day, possibly setting a limit to our Deep Learning Models.

To address this challenge, we need to change our understanding of architectures, to create smaller and high-performance ones, or we may need to come up with new infrastructure that utilizes both digital computers and analog which consumes less power making it more efficient than digital ones.

References

  • Neil C Thompson, Kristjan Greenewald, Keeheon Lee, and Gabriel F Manso. The computational limits of deep learning. arXiv preprint arXiv:2007.05558, 2020.
  • New Mind. (2021). The AI Hardware Problem [Video]. Retrieved from https://youtu.be/owe9cPEdm7k

I hope you find this article useful. For more information about our works, you can visit aiforexistence.com. And you can join the discussion.

Thanks for reading!

--

--