Deep Learning — What’s the hype about?
A Beginner’s Overview
A brief overview for beginners, by a beginner.
Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL)…you’ve no doubt heard these terms before, but unless you’re in the tech space, or deal with big data, then they may be little more than buzz words to you! That’s how I felt at first, well at least until a little ‘light’ reading for a new job highlighted just how apparent these techniques are in our daily lives. It also made me realise the enormous potential that AI, ML and DL could have in advancing society and thus shaping our future. Later, I will touch on some of the popular and novel use cases of AI, ML and DL, and the surrounding hype. But first, it would help to understand the difference between them, as the terms are sometimes used incorrectly.
Definition and History
The first thing to note when defining these terms is that, dependent on the direction, they can technically be used interchangeably. Ultimately DL is a sub-category of ML which is sub-category of AI (Figure 1). Thus, referring to DL as AI is fine, but the opposite is incorrect and could create confusion. Quite often DL is deliberately referred to as AI or ML for marketing purposes, as the latter two are better known by the public and thus more likely to invoke feelings of awe, innovation and excitement.
Artificial Intelligence describes the ability of machines to simulate and mimic human-like intelligence and behaviour. This includes specific processes such as learning and reasoning. The idea of AI has been around for centuries, with the modern concept being popularised in 1950 in Turing’s seminal paper, and the term coined in 1956 by John McCarthy.
Machine Learning is an application of AI that uses statistical methods to enable computer systems to learn automatically. Although the computer is initially programmed how to learn, their end ability to predict an outcome is never explicitly programmed. One of the earliest and most well-known use cases of ML was the Samuel Checkers-playing Program from the mid 1950’s, which allowed the computer to learn and improve it’s ‘game-playing’ by playing games against itself.
Deep Learning is a specific sub-category of ML which uses the human brain as a model to structure algorithms into hierarchical networks known as artificial neural networks (ANNs). This allows for the automated learning of more complex and often abstract data. Whilst concepts of deep learning and neural networks have been around for decades, DL truly came into fruition in the late 2000’s. This was due to large advances in computer processing power as well as an increase in the amount of complex data companies were capturing and processing i.e. big data. In 2012 deep learning received much publicity within the AI field after a deep learning algorithm (AlexNet) won the highly coveted ImageNet Large Scale Visual Recognition Challenge. In the same year, the Google Brain team led a large breakthrough in unsupervised learning. Until then, most deep learning algorithms were supervised, meaning they needed to be trained on labelled data i.e. be told if the assumption they made was correct. In unsupervised learning, the algorithm is shown unlabelled data and learns to identify recurring patterns itself by finding similarities amongst the data. For example, in the infamous cat experiment, which popularised deep learning amongst the wider public, the Google Brain team’s deep learning algorithm processed 10 million images taken from random YouTube videos. One outcome was that one of the nodes of the output layer had a strong affinity to images with cats i.e. it learned to fire when a cat was in the image, despite not knowing what a ‘cat’ is.
Types of Artificial Neural Networks
Although recent advances in computing power have led to the boom in deep learning, many of the networks have been defined for decades. The simplest of these, a perceptron (aka single-layer perceptron network; technically a perceptron is an algorithm not a network) was defined in 1957.
Whilst each type of ANN has their own distinct features and thus use cases, at their core they all contain three components:
- Input Layer — the first layer; data is received passively through the nodes (it is not manipulated) and then passed to the subsequent hidden layer.
- Hidden layer/s — data is processed using a variety of mathematical functions. The structure, type and number of hidden layers and functions are key factors in how the data is processed. It is these factors that also differentiate the various ANNs.
- Output — the final layer; node/s in this layer provide the final solution of the network
Feed Forward Networks (FFN) aka fully-connected networks, are the most common type of ANNs. Signals always travel forward from the input layer to the output layer via the hidden layer/s. Each node in any one layer is connected to all nodes in the subsequent layer, however they are never connected to other nodes within their own layer. FFNs are used to model how several input variables affect an output. As such, they can be used in many settings such as data compression, text recognition, speech recognition, as well as simple image recognition and classification to name a few. However, given their simple design, they are limited by their large processing requirements. As such, although they may be successfully implemented in various settings, often they may not be the optimal or preferred ANN.
Convolutional Neural Networks (CNN) contain two central processes known as feature extraction and classification. Unlike typical FFNs, nodes in the feature extraction stage do not connect to every node in the subsequent layer; but only to some. Ultimately numerous groups of adjacent inputs are convolved into groups of single outputs which are then aggregated and converted from 3D vectors to 1D vectors. These are then run through the classification stage, which mimics a typical FFN. CNNs were inspired by the structure and function of the animal visual cortex, where each neuron in the visual cortex fires only when stimuli is within a specific region of the animal’s visual field, with the animal’s perception being an aggregate of all the outputs. CNNs have been commonly used in complex image classification and image recognition. Perhaps the most well-known algorithm is LeNet-5, which was used in the recognition and classification of digits from a large database known as the MNIST database of handwritten digits. However, this specific example could also have been run successfully using an FFN. Another database commonly used is the CIFAR-10 database, which contains 60,000 32x32 colour images in 10 different categories (e.g. airplanes, cars, birds etc), and facilitates the training of CNNs how to recognise objects.
Recurrent Neural Networks (RNN) are similar to FFN’s, but additionally allow for signals to be fed back through recurrent loops within the hidden layer/s. This means the output of a specific node is fed back into the same node recurrently, with the previous output being used to calculate current output. As such, they have ‘memory’. As the amount of memory which can be stored in a node is limited, a more specialised RNN, called long short-term memory units (LSTM) were introduced. These contain separate memory cells which store the output of adjacent nodes, allowing the recurrent loop to run significantly more times. RNNs, especially LSTMs are synonymous with Natural Language Processing (NLP), which isn’t surprising given that in NLP, letters/words are heavily dependent on the preceding letter/word. RNNs are also widely used in time-series problems.
Generative Adversarial Networks (GAN) contain two neural networks, a generator and a discriminator, competing against each other in zero sum game i.e. the sum of the total losses and gains amongst the two at any given time is zero. The generator is a typical CNN and is tasked with generating new data, whilst the discriminator is an inverse CNN, and is tasked with identifying whether the newly created data mimics the training set data. The important part to note here is that both networks are actively learning from one another. Unlike the networks outlined above, which fall under supervised learning, GANs are unsupervised. GANs can be used to generate media such as images, music and text and have interestingly been used to generate images from text. A recent novel use case of GANs has been in making night driving safer. Here, an optimal view of surroundings is generated from a dark and distorted camera feed.
So why the hype?
You’d be surprised just how entwined AI is with our daily lives. Here are some examples of technology which currently rely on deep learning:
- Virtual Assistants e.g. Siri, Google Assistant, Alexa, Cortana
- Recommendations e.g. movies on Netflix, pages on Facebook, music on Spotify
- Facial recognition e.g. auto tagging of contacts on iPhone and Facebook
- Translating languages e.g. Google Translate, DeepL Translator
- Fraud detection in banking and finance
- Self-driving cars e.g. Waymo, Tesla
AI also has the potential to help us within our daily work life. So many of us get tied down doing tedious, repetitive and mind-numbing tasks, many of which do not require much expert knowledge, let alone your bachelor or postgraduate degree…albeit you just needed a little training and maybe someone briefly supervising you …ah sounds like AI to me! In its most basic form, AI can help automate some of these simple processes, thus allowing us to gain back some of our innovation, creativity and individuality! In other cases, such as engineering or science, it is already driving innovation and leading us to new discoveries and concepts whilst pushing the boundaries of our abilities and existence!
Hopefully you are beginning to realise the use cases for deep learning are seemingly endless across numerous industries. In future posts we will summarise some use cases of deep learning in healthcare and finance.