Why Deep Learning? Some cool applications

André Pinto

Published in

Deep Learning Sessions Portugal

7 min readMay 5, 2021

Part 5 of the “Getting Started in Deep Learning” Series

Motivation

In the last blogpost we made a transition from simple linear models, like Linear Regression to non-linear models like Neural Networks. We motivated this transition following the inaptness of linear models to describe real-world functions which are not, most of the time, linear on the inputs.

We have exposed a specific non-linear problem to which neural networks have been applied, specifically the task of classifying an animal based on a picture. But to what other kinds of problems can deep neural networks be applied? This post aims to shed some light on some of the current applications of Deep Learning. We hope we can get you excited to learn about some of these applications, and that is why we provide some additional resources along the way!

But first, let’s consider a classical problem, which will help us illustrate why linear models are unable to map a wide range of functions.

The XOR Problem

The XOR (exclusive or) problem is a very famous problem in the field of artificial neural networks and clearly shows the limitations of linear models. The XOR is a binary function that returns 1 whenever the binary inputs are different and 0 when they are equal.

Let’s try to model the XOR problem starting with a simple linear model:

Simple Perceptron model. Linear on the inputs.

To learn the XOR problem we feed our model four distinct training examples:

If we treat the XOR problem as a regression problem, we can try to discover f*(x) (a good approximation of f) by minimizing the Mean Squared Error across the training set. Unfortunately, this will converge to a solution which outputs 0.5 throughout our space. The image below provides a useful visualization of why linear models are not able to represent the XOR function, but can represent the OR function.

Representation of the OR and XOR Functions. We can see that the OR is linearly separable whereas the XOR is not.

Whereas the OR function is linearly separable (we can draw a line separating the 0 and 1 examples), the same does not happen in the XOR function. To separate examples from different classes we need a curved line, which cannot be represented by a linear model. Solving this problem requires that we somehow represent the XOR problem in a different space where it can be linearly separable. We can do this by applying a non-linear transformation to our data.

After applying a non-linear transformation the XOR problem becomes linearly separable.

The XOR problem is just one example which illustrates the limitations of linear models, being relatively trivial to solve. However, we are often interested in much more complex problems. Currently, Deep Learning techniques are being applied to a plethora of tasks and a multitude of fields. There are three main reasons behind the popularity of these algorithms when compared to classical machine learning algorithms (like SVM or Random Forests):

When there is plenty of data and reasonable computational capability, deep learning models typically outperform classical machine learning algorithms since they can represent a much broader set of functions.
Deep learning models do not require hand-engineered feature extraction. instead they can learn meaningful representations of the features for the task at hand— this is known as Representation Learning. This is especially relevant when dealing with non-trivial tasks, in which choosing an informative subset features can be quite challenging.
Some Deep Learning architectures, like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) enjoy domain-specific knowledge in their construction, which makes them particularly competent in a set of tasks. CNNs, for example, are inspired in someway by the vision cortex.

Deep Learning Applications

To finish off our series we would like to give a brief overview of some applications where deep learning methods are being used. It would be unfeasible to list them all, and so we decided to focus on three major fields:

Computer Vision.
Natural Language Processing.
Deep Generative Models.

Computer Vision

The field of computer vision consists of a class of algorithms capable of mapping visual structured inputs (like images or videos) to some desired output. This can be a digit (0 to 9) in the case of a digit recognition system, a category (dog, cat, etc.) in the case of an animal classifier, or even a set of coordinates referring to the location of a specific object in an image (as in the case of traffic lights detection systems).

The following image was extracted from the original YOLO (You only look once) paper which revolutionized the object detection industry.

Computer Vision algorithms usually make use of a specialized type of neural network architecture introduced before: Convolution Neural Networks (CNNs). If you would like to learn more about these types of networks, we strongly suggest you to read chapter 9 of the Deep Learning Book. Alternatively, the 4th course of Coursera’s Deep Learning specialization is a great way to develop your understanding of these topics, with guided assignments where you implement applications similar to the ones described here.

Natural Language Processing

The major field of Natural Language Processing also deals with structured data, but specifically with natural language data, either spoken or written. Several tasks fall under the Natural Language Processing realm. Examples include speech recognition (in which a spoken sentence is matched to a text output), machine translation (automatically translate text from one language to another) and sentiment analysis (produces a score or a classification relative to a block of text regarding its message).

Similarly to Computer Vision, the field of Natural Language Processing also enjoys a specific type of architecture for building their neural networks: Recurrent Neural Networks (RNNs). RNNs exploit a useful property of natural language data, which is its sequential nature: if you want to predict the next word in a sentence it is useful to consider the previous words.

Likewise, if you would like to further explore RNNs we advise you the chapter 10 of the Deep Learning Book, or for a more thorough reading the Speech and Language Processing Book.

Deep Generative Models

Most Computer Vision and Natural Language Processing tasks can be solved using discriminative models. This means that they try to model a conditional probability distribution P (Y| X = x) — given an observation/example x, the model will produce a label y. A generative model does exactly the opposite. It models the conditional probability distribution P(X| Y = y) — tries to produce some observation based on the class provided.

Two of the most common types of generative models are the Variational Autoencoders (VAEs) and the Generative Adversarial Networks (GANs). You are probably familiarized with some of the applications of these models, specifically regarding the generation of DeepFakes (images which resemble real people but are actually created based on generative models learning from real pictures).

Images generated using a GAN system proposed on ².

If you would like to know more about Deep Generative Models we recommend the GANs specialization on Coursera, or An Introduction to Variational AutoEncoders.

Additional Resources

Throughout this blogpost we have suggested several resources which might be helpful if you are looking to learn more about a particular class of algorithms. Below we list some other more general resources that were useful to us when starting on Deep Learning and we hope they can help you too:

Machine Learning in Coursera by Andrew Ng;
Deep Learning Specialization in Coursera by Andrew Ng;
Andrej Karpathy’s CS231n Lectures;
Deep Learning book by Ian Goodfellow, Yoshua Bengio, Aaron Courville;
Deep Learning Lecture Series by DeepMind x UCL.

Wrapping up the Series

This blogpost concludes the 5 part series on How to Get Started in Deep Learning. We hope to have provided a concise overview of this thematic alongside some additional resources for anyone looking to go deeper into each topic.

Stay tuned for more Deep Learning Sessions Lisboa posts!

Acknowledgments

This publication was written with the help of Inês Pedro, also an organizer of the Deep Learning Sessions Lisboa.

References

[1] J. Redmon, S. Divvala, R. Girshick , A. Farhadi, You Only Look Once: Unified, Real-Time Object Detection (2015)

[2] T.Karras, T.Aila, S.Laine, J.Lethinen, Progressive Growing of GANs For Improved Quality, Stability and Variation (2018)