Introduction to Neural Networks
I’m a Junior Software Engineer passionate about learning and growing every day as an engineer, and as a person.
One topic that I have been very interested in for a while is Machine Learning. Specifically, neural networks. But it can be pretty daunting for those starting out, especially if your calculus is a bit rusty. The goal of this article is to help explain some of these concepts to myself, as well as to help anyone else who may be in the same position. So let’s dive in…
What are Neural Networks
Neural Networks are a subset of Machine learning, which according to Wikipedia, “is the study of computer algorithms that improve automatically through experience”. I like to think of it as the field dedicated to finding ways to solve problems that are difficult, or even impossible, to solve using traditional programming methods.
In traditional programming, the programmer gives the computer an input and a set of instructions, or functions that will result in some output. This is fine for many tasks, but can become unwieldy when it comes to certain types of problems. One of these problems is image recognition, which requires the correct interpretation of massive amounts of data. These problems may seem intuitive to us as humans because our brains are so skilled at solving them, but they are incredibly difficult to express as code. They require a very high level of expertise and understanding of the exact problem that we are trying to solve. And with especially difficult problems, like image recognition, it’s easy to get lost in the minutiae.
Neural Networks are a computing architecture that attempt to solve this problem by shifting the paradigm. Instead of giving the computer the input, and the steps to find the output, why not give it an input, and tell it what you expect the answer to be. You can do this over and over again, providing the input, and the correct output, and letting the computer figure out the details of how it gets there. It does this over time through trial and error. This is similar to how humans learn, and there’s a reason for that. These systems are called neural networks because they are designed to mimic, on a very basic level, the biological network of neurons that facilitate human learning. Leveraging these systems help us solve those problems that may have previously seemed impossibly complex.
What makes a Neural Network
What is a Neuron?
Each network is made up of several layers of ‘neurons’. These neurons are basically functions that receive some input and use it to produce some output, or make a ‘decision’. But for now it may be more intuitive to just think of these neurons as a little box that holds a number between 0 and 1. With 0 meaning it’s off, and 1 meaning it’s on. Take note though, that these don’t have to be either fully on OR off. If you are a visual person, like me, it might be more helpful to imagine each neuron as a dimmable light bulb instead. The closer it is to 1 the brighter it is, and the opposite is true as it gets closer to 0.
Dense Layers and Dense Neural Networks
When each neuron in a layer is connected to each neuron in the previous layer, this is known as a dense layer. A dense neural network is a network made up of dense, or fully connected, layers. A network always consists of an input layer, an output layer, and any number of hidden layers in between. Below is a very simple example of a dense neural network with an input layer, an output layer, and one hidden layer.
‘Hidden layer’ may be a bit of a confusing term at first, since it doesn’t mean what you might assume it means. You can still access, or see, the data inside of it. The reason it is called a ‘hidden’ layer, is because the state of that layer is not directly controlled by the programmer. Instead, this is adjusted and fine-tuned by the network itself. So you could say that we don’t really have any decision making power here.
Now that we know the basic set-up of a neural network, let’s look at the way inputs, or data, are sent through it.
Forward Propagation
Weighted Sum of all Inputs
Remember that each neuron has a value between 0 and 1 that determines how on/off it is. Once the neuron decides on this number, it passes it on to the next layer of neurons. The neurons in the next layer will decide their on or off value based on the inputs that they get.
But there is a catch… remember how I said that in a dense layer, each neuron is connected to each neuron in the layer before? That means lots of connections. But not all connections are created equal. The connection between any two neurons has a ‘weight’. This weight tells the neuron that is receiving information, how important the information coming through that connection is.
A high weight, means that information coming through that connection is very important, while a low weight would mean that information coming through that connection is not so important. The receiving neuron adds the information coming from each of its’ connected neurons, multiplied by the weight of each corresponding connection. This is the weighted sum.
So this is what we have so far, lots of neurons like the one below that take the weighted sum of all inputs (X1*W1 + X2*W2 + … Xn*Wn) and produce some output.
This Process for getting the weighted sum of all inputs is expressed mathematically as…
Activation Function
How does it produce this output? How does it decide whether it is on or off, or somewhere in between, based on its inputs? Well, it takes this weighted sum that it has received, and passes it through a non-linear activation function. This function converts whatever we have passed in, into a number between 0 and 1. Numbers large in the negative direction will result in an output that approaches 0, and numbers large in the positive direction will result in an output that approaches 1. The Sigmoid function shown below, is one of the most common examples of an activation function.
As I mentioned, the sigmoid function is a non-linear activation function. Non-linearity is important, because while linear functions might be fine for approximating linear relationships, non-linearity gives us the ability to approximate arbitrarily complex functions, which is exactly our goal. I assume that this is called the activation function because the output of this function is what determines the activation of that neuron, but don’t quote me on that.
This neuron below takes the weighted sum of its inputs and passes it through an activation function (let’s go with the sigmoid function) to produce an output between 0 and 1.
Bias
There’s just one more thing that I may have neglected to mention. Each neuron also has a ‘threshold’ number associated with it. This threshold number is basically how hard you have to work to get that neuron to turn on. This number tells you how much that neuron ‘wants’, for lack of a better word, to be on or off. This threshold number is called its bias. This bias is added to the total weighted sum just before it is passed through the sigmoid activation function.
We’ve now got a neuron that gets the weighted sum of all inputs, adds its bias, passes that total through an activation function, and outputs the result of that computation.
Expressed mathematically, all of that, looks like this…
This is exactly what a neuron in a neural network is. It is a function that performs this calculation every time it receives an input, and sends the result as output to the next layer of neurons, each of which will do the exact same thing.
This is called forward propagation, and this is how the network passes data successively through until it comes to a decision. But we haven’t yet discussed how the model can adapt itself to produce more and more accurate output. That will be covered in Part II of this series, where I will go over back propagation.
I’d like to note that, as I mentioned above, I wrote this to help explain the concepts to myself as well as to others. If you notice something off, please let me know! With that said, thanks for reading :)