A short introduction to Neural Networks and Deep Learning

Published in

Bedrock — Human Intelligence

10 min readDec 1, 2020

Introduction

In this article I attempt to provide an easy-to-digest explanation of what Deep Learning is and how it works, starting with an overview of the enabling technology; Artificial Neural Networks. As this is not an in-depth technical article, please take it as a starting point to get familiar with some basic concepts and terms. I will leave some links along the way, for curious readers to investigate further.

I am working as a Data Engineer at Bedrock, and my interest in the topic arose due to my daily exposure to doses of Machine Learning radiation emitted by the wild bunch of Mathematicians and Engineers sitting around me.

Deep Learning roots

The observation of nature has triggered many important innovations. One with profound socioeconomic consequences arose in the attempt to mimic the human brain. Although far from understanding its inner workings, a structure of interconnected specialised cells exchanging electrochemical signals was observed. Some imitation attempts were made until finally Frank Rosenblatt came out with an improved mathematical model of such cells, the Perceptron (1958).

The Perceptron

Today’s Perceptron, at times generalised as the ‘neuron’, ‘node’ or ‘unit’ in the context of Artificial Neural Networks, can be visually described as below:

It operates in the following manner: every input variable is multiplied by its weight, and all of them, together with another special input named ‘bias’, are added together. This result is passed to the ‘activation function’, which finally provides the numerical output response (‘neuron activation’). The weights are a measure of how much an input affects the neuron, and they represent the main ‘knobs’ we have at our disposal to tune the behaviour of the neuron. The Perceptron is the basic building block of Artificial Neural Networks.

Deep Neural Networks (DNN)

Deep Neural Networks are the combination of inputs and outputs of multiple different Perceptrons on a grand scale, where there may be a large number of inputs, outputs and neurons, with some variations in the topology, like the addition of loops, and optimisation techniques around it, as you can see in the picture below:

*- Multi-layer Perceptron or Feedforward neural network -*

We can have as many inputs, outputs, and layers in between as needed. These kinds of networks are called ‘feedforward-networks’, due to the direction of data flowing from input to output.

The leftmost layer of input values in the picture (in blue) is called ‘input layer’ (with up to millions of inputs).
The rightmost layer of output perceptrons (in yellow) is called the ‘output layer’ (there can be thousands of outputs). The green cells represent the output value.
The layers of perceptrons in between (in red) are called ‘hidden layers’ (there can be up to hundreds of hidden layers, with thousands of neurons).

The word ‘deep’ refers to this layered structure. Although there is not total agreement on the naming, in general, we can start to talk about Deep Neural Networks, once there are more than 2 hidden layers.

To get an idea of what I’m talking about, a 1024x1024 pixels image, with 1000 nodes in the 1st layer, 2 outputs, and 1 bias input will have over 3 billion parameters.

Choosing the right number of layers and nodes is not a trivial task, as it requires experimentation, testing, and experience. We can’t be sure beforehand which combinations will work best. Common DNNs may have between 6 to 8 hidden layers, with each layer containing thousands of perceptrons. Developing these models is therefore not an easy task, so the cost-benefit trade off needs to be evaluated: a simpler model can sometimes provide results that are almost as good, but with much less development time. Also, teams with the skills to develop Neural Networks are not yet commonplace.

Deep Learning (DL)

Deep Learning is a branch of Artificial Intelligence leveraging the architecture of DNNs to resolve regression or classification problems where the amount of data to process is very large.

Suppose we have a set of images of cats and a set of images of dogs, and we want a computer program that is able to label any of those pictures as either a cat picture or a dog picture with the smallest possible error rate, something called an ‘image classification problem’. As a computer image is basically numerical data we can, after applying some transformations, introduce it as input to our network. We configure our network based on the nature of the problem, by selecting an appropriate number of inputs, outputs, and some number of layers and neurons in between. In our case, we want our network to have two outputs, each associated with a category, one representing dogs, and the other one cats. The actual output value will be a numerical estimation representing how much the network ‘thinks’ that the input picture could be either one category or the other:

The outputs are probability values of the image being a dog or a cat (although in many cases they do not necessarily add up to 1). The initial set of weights is randomly chosen, and therefore, the first response of our network to an input image, will also be random. A ‘loss function’ encoding the output error will be calculated based on the difference between the expected outcome and the actual response of the network. Based on the discrepancy reported by the loss function, the weights will be adjusted to get to a closer approximation.

This is an iterative process. You present a batch of data to the input layer, and then the loss function will compare the actual output against the expected one. A special algorithm (backpropagation) will then evaluate how much each connection influenced the error by ‘traversing’ the network backwards to the input layer, and based on that, it will tweak the weights to reduce the error (towards minimising the loss function). This process goes on by passing more images to the network, from the set of images for which we already know the outcome (the training set). In other words, for every output that produces a wrong prediction/estimation, we should reinforce those connections’ weights for the inputs that would have contributed to the correct prediction.

We will use only a fraction of the labelled dataset (training set) for this process, whilst keeping a smaller fraction (test set) to validate the performance of the network after training. This process is the actual network ‘learning’ phase, as the network is somehow building up ‘knowledge’ from the provided data, and not just memorising data. The larger the amount of quality data we feed in, the better the network will perform over new, unseen data. The key point to grasp here is that the network will become able to generalise, i.e. to classify with high accuracy an image that has never seen before.

Why now?

Only in recent years has DL become very popular, despite the fact that most of the foundational work has been around for decades. There has been a lot of friction, caused by technological limitations and other challenges, against the widespread use of DL, with some recent breakthroughs being key to the current adoption of the technology. Just to mention a few factors:

Deep Learning algorithms are data hungry. They only perform well when large labelled datasets are available.
Businesses have finally started to give all the deserved importance to serious data collection strategies, which is already paying off and will even more so in the near future. Not using these techniques should no longer be an option. As of 2017, 53% of companies were already adopting big data strategies, and in 2019, 95% of businesses needed to manage big data.
Deep Learning computational requirements are extremely demanding. The time taken to properly train a Neural Network was simply impractical in most cases given the available technology.
Now we have efficient distributed systems, GPU architectures, and cloud computing at reasonable prices. Therefore, every business can now rely on on-demand computational power without the burden of having to set up their own infrastructure running the risk of quick obsolescence, and thus are able to exploit DL power at lower cost.
Algorithmic challenges. There were important issues to get the optimisation algorithms to work on more than 2 hidden layers.
Thanks to breakthrough ‘discoveries’ like backpropagation, convolution, and other techniques, there has been a way to drastically reduce the amount of ‘brute-force’ computational requirements. Also, thanks to available online content and toolsets like ‘tensorflow’, most people can finally experiment and learn in order to create the most diverse applications out of these techniques.

Use Cases

Deep Learning can be used for Regression and Classification tasks, from small to large scale, although for small scale issues other Machine Learning techniques are more suitable. When larger datasets are involved, together with the necessary computational resources, Deep Learning is probably the most powerful Machine Learning technique. I will list here only a few use cases with common applications:

Recommender systems: mostly used in e-commerce applications, to predict ratings or user preferences (the Amazon recommendation engine, for example).
Speech synthesis/recognition: used in verbal/oral communication with machines, as an alternative or replacement to more traditional types of human/machine interactions (like Apple’s Siri assistant).
Text processing: applications can predict textual outputs based on previous inputs, as in search text completions (Google search bar, for example).
Image processing/recognition: used where heavy loads of images (including video) need to be processed, as in computer vision, satellite, medical imagery analysis, object detection, autonomous driving.
Game playing: systems that can learn from previous games,and compete against humans (DeepMind, AlphaGo).
Robotics: advanced control systems for industrial automation, robots with special physical abilities that could replace human workers in hostile environments.

The good thing is, that you can find most of those applications already at work within your phone!

In the case of games, there was a public challenge between a professional Go player and a Team of experts that developed a Deep Learning application, nicknamed AlphaGo. AlphaGo won the challenge, winning 4 games and losing just 1. It was initially trained from existing Go game datasets generated by communities of online gamers, and with the input of some professional players. From a certain point, AlphaGo was set to learn and improve by playing against itself. Expert players declared that AlphaGo came out with beautiful and creative moves, as they witnessed a machine making moves that no professional would have thought of doing until that moment (Go has a quite long tradition). As an analogy for commercial applications, unexpected business insights may be generated using deep learning techniques that no human could have foreseen or guessed through traditional analytical models or from his own experience.

Other impressive results from DL applications have recently been achieved in the automated text generation with OpenAI GPT-3 new algorithms. This is another special DL network that can work with unlabelled data, as it will automatically detect patterns from very large textual datasets. These networks are able to generate text contents that may often appear as if they were written by humans. Remarkably the entire English Wikipedia apparently makes up less than 1% of the training data used to train GPT-3! You can see GPT-3 at work here:

Considerations

Despite the many practical applications, their usage is still, if not complex, a very tricky one. It is possible to generate a deep learning model with little prior DL knowledge, but in most cases, we’re likely to obtain misleading results. The handling of models running in any critical or sensitive environment, should be left to people with the right technical expertise.

The quality of the data we feed in when training a DNN, is of key importance. It is a common fact that many projects involving DL, despite having very sophisticated models, at times cannot go live simply because real data does not match the model required standards. As the saying goes; ‘garbage in, garbage out’. To make the most out of these analytical methods and architectures, it is critical to implement a strong data culture, establishing robust collection, usability, and compliance strategies, also embedding education and training mechanisms at the core of the business.

There are also ethical issues arising from some biased results generated by DL, and the generation of false propaganda/information (deep fakes).

Also, there is still quite some mystery as to the inner workings of DL, which may open the door to potential issues that may be difficult to detect and avoid. In fact, it is possible to manipulate an image so that a human may not perceive it, whereas a machine might misclassify it completely.

Acquiring a better understanding of the possibilities given by these machine learning algorithms, identifying when DL is really an option to be considered, will surely allow us to set the right goals and expectations.

Conclusion

We should not consider DL in any way as related to human intelligence. We are still not even close to such complexity. However, the way we should embrace this branch of Artificial Intelligence is as another, very powerful extension of our capabilities, rather than as a threat to our jobs. Threats come from misuses…but that’s another story. Possibly the most obvious differentiator between humans and any other known form of life, is our ability to build tools, and Deep Neural Networks are among the most promising we dispose of today.

A short introduction to Neural Networks and Deep Learning

Written by Lorenzo Amante