Neural Networks, in a Nutshell

Matthew Farah
CodeX
Published in
6 min readAug 18, 2021

In this post in the series, we will be giving an overview of neural networks, how they work, and what their applications are in machine learning.

Image of brain made out of interconnected nodes

Before diving into the more high-level concepts, it’s important to first cover the history, or rather, the inspiration behind neural networks, which will give us a much more intuitive grasp on how and why neural networks work in the first place.

Perhaps unsurprisingly given the name, neural networks are inspired by the way in which neurons in our very own brains function. Simply put, we can think of neurons as either firing or not firing, and when we have a series of interconnected neurons, whether a given neuron will fire or not becomes dependent, almost in a butterfly effect type of way, on all the neurons which precede it.

Where neural networks and neurology begin to diverge however is in what exactly a neuron is intended to represent. Neurons (aka nodes), in the field of artificial intelligence can be thought of as containing or holding data, such as a float, and the “synapses” connecting each neuron as weights which dictate the strength of an activation to another neuron.

To help illustrate this concept, let’s take a look at a depiction of what a simple neural network might look like.

Example of a simple neural network. (Credit: https://towardsdatascience.com/step-by-step-guide-to-building-your-own-neural-network-from-scratch-df64b1c5ab6e)

As you can see, each circle in the image is representative of one neuron, so the first layer of the network contains 3 neurons, the second layer (aka the hidden layer) contains 5 neurons, and the last layer contains 2 neurons, making the total number of neurons in this network 10. More on layers later.

The lines which connect neurons are indicative of the weights. Weights themselves are an assigned numerical value which represent the strength of the connection between any two nodes, the greater the weight, the stronger the connection, or correlation, between two neurons. Weights are especially important to consider because they directly influence the output of a node.

The reason why weights play a crucial role in what the output of a neuron will be is because the output of any given neuron is simply a weighted sum of all the inputs connected to it.

To see this in action, let’s see what the output of a neuron might be given some inputs and weights.

So from the image, the neuron whose output we’re trying to find is connected to three neurons in the previous layer, each with an output of 2, 4, and 12 respectively. These three neurons are connected to the neuron in the next layer by the weights which carry a strength of .5, .6, and .7, in that order. To find out what the output of this mystery neuron will be, we must multiply the value contained in each neuron by the strength of its weights, and then calculate its sum. Therefore, the output of the mystery neuron will be equal to (2 x .5) + (4 x .6) + (12 x .7) which is 12.8.

Layers, which we briefly touched on earlier, form the backbone or structure of the neural network. Layers can be thought of as columns of neurons which are connected by the weights to the layers before and after it. Furthermore, the position of a layer in the neural network dictates both its name and purpose.

For example, the first layer in a neural network is known as the input layer, whose neurons represent each component of the data which the neural network will be using to learn and make a prediction off of.

Speaking of predictions, the last layer in a neural network is known as the output layer, and if the neural network is classifying data, there will be one neuron in the output layer for each possible category of the input data. For example, say we are training a model to classify images of cars and images of trucks, we would need two neurons in our output layer to represent the two possible categories which our data can fall under.

The remaining layers, situated between the input layer and output layer, are known as hidden layers. The number of hidden layers, as well as the number of neurons in each hidden layer is arbitrarily decided, meaning that you, the programmer, can experiment and tweak with different combinations to see what best suits your model.

Hidden layers are useful because they help the model extract important features from the input data which are crucial in helping the model pick up on patterns, and so classify different inputs.

Before continuing, it’s important to acknowledge that there are many different types of hidden layers, which impact the different operations that occur between layers, and may each be beneficial depending on the context. One of the most commonly used type of layers, are known as dense layers, which connect all the neurons of a layer to all the neurons in the next, and is precisely the type of layer we saw being used in our first example. A full discussion on the different types of layers, sadly, falls beyond the scope of this article, however stay tuned and let me know if that would be something you might be interested in in the future.

This may sound all well in good, however you might be wondering: “How in the world is a system so chaotic able to learn or produce anything intelligible at all!” Fantastic question! Neural networks actually learn by adapting these weights during the training process in order to map an input into its intended classification.

How do neural networks accomplish this? Well most neural networks, namely supervised ones, are provided with labels for the data it’s trained on, and at the end of each pass through the network, the model calculates how wrong it’s prediction was. How inaccurate the model was is called the model’s loss, which can be calculated in many different ways. More on loss in a future post. Models then use optimizers, such as Stochastic Gradient Descent, which take the loss function created for each neuron and gradually adjust the weights to find the minimum of this function, which will reduce the overall error of the model.

This may all sound confusing, so let’s try and clarify with an analogy. Think about it like this, neural networks are essentially playing a game of plinko where the model’s job is, based on where the chip was placed, to adjust the pegs so that the chip will fall into its specific slot as often as possible.

Plinko board

As I’m sure you already know, neural networks have quickly become one of the hottest topics in computer science, largely in part do its ability to quickly pick up on patterns in data. This has had numerous implications on almost every industry in the world, and with no signs of stopping, neural networks have cemented themselves as the cornerstones of machine learning, from image classifiers to auto encoders to GANS.

Before ending this off, I would recommend anyone who might like to watch an animated video detailing these concepts to check out 3blue1brown’s fantastic video on the subject.

Thank you everyone for reading through this far, keep an eye out for future posts and feel free to email me at matthewtamerfarah@gmail.com!

--

--

Matthew Farah
CodeX
Writer for

Aspiring software engineer. Always looking to learn and achieve more.