The Magic of Feed-Forward. Deep Learning with PyTorch Part #3
It is one thing to know what an artificial neural network (ANN) is and it is another thing to know how it actually works. If you ever want to utilize deep learning to build cool projects, it is essential that you understand ANNs. The concepts from ANNs can be applied to nearly every other type of neural network. When it comes to ANNs, there are 2 major concepts to understand: Feed-Forward and Back-propagation. Both of these concepts require a lot of in-depth understanding, so I will split them up into different articles. In this article, we will get our hands dirty with the feed-forward concept. Fair warning, in order to understand feed-forward, you will need to understand basic matrix/vector operations. In case you need a refresher, here is some reference material: Basic Matrix Operations Review.
Feed-forward is a rather simple concept. Essentially, all you are doing is sending your data through your network. In other words, you are sending your data through the input layer to the hidden layers and, finally, to the output layer. Here is a great visualization that illustrates this concept:
As the illustration above shows, feed-forward is essentially the process of taking your inputs (your features) and putting it through your ANN to get your output(s). However, there is still one question that remains unanswered in this discussion: what happens at each node? Let’s answer this question next.
What Happens at Each Node?
The calculations that occur at each node are very, very important. Each node takes a weighted summation of its inputs & then puts the sum through an activation function. Confused? Don’t worry let’s break it down.
Each input has a weight applied to it. This is because some inputs may be more/less important than other inputs. For example, if I was building a neural network to classify images of cats & dogs, I would be more concerned with the shape of the eyes, ears, etc. rather than whether the image is indoors or outdoors, etc. I hope you don’t determine something to be a cat or a dog soley based on location (indoors or outdoors). I imagine you determine something to be a cat or a dog based on features that define each animal such as eyes or ears. Clearly as you can see, it is not justifiable to give each input the same weight. Some inputs definitely deserve higher weights than others.
In order to apply the weights to a given node’s inputs, we perform a weighted summation. If you are familiar with machine learning, there might be a light-bulb going off in your head. This is because this process is exactly the same as linear regression!
The image to the left shows a basic 1-variable linear equation. b0 represents the bias term, b1 represents the weight applied to the input X, and Y is your output. This is what would occur at a node with only 1 input and no activation function. In a real world scenario, you would have many, many inputs to your node. Thus, we can expand this equation to becoming the following: Y = b + W1*X1 + W2*X2 + … +WnXn. Wn represents the nth weight for the nth input X. It is possible to condense this equation using linear algebra (Note that the weights & inputs are usually present within a matrix). When we look at the weights & inputs as a matrix, we get the following equation: W^T*X + b. Note that ^T stands for transpose (I am assuming that you know what transpose means, but in the event that you don’t please refer to the Resources below). Also, note that the “W^T * X” part may be rearranged at times depending on the shapes of the weight matrix & the input matrix. This calculation is performed at every node in the ANN.
Now that we have generated a concrete understanding about the weighted summation that occurs at each node, let’s develop a concrete understanding regarding activation functions.
An activation function is very, very simple to understand. It is essentially a function that takes in some input and provides some output depending on the specialty of the function. For example, a common activation function is a sigmoid function. The sigmoid function prides itself on squeezing its values between 0 & 1. Likewise, there are many activation functions that perform some kind of transformation on your input. With this in mind, I would like to now introduce the formal mathematical way of looking at what happens at a single node:
Output of a Node = y(W^T * X + b)
- y = activation function
- W = weights matrix
- X = inputs matrix
- b = bias term
Essentially, you are taking your weighted summation & passing it through an activation function. I just have 2 final comments I want to make on this subject:
- It isn’t necessary to have an activation function. Your ANN will work just fine without one. However, activation functions allow you to transform your nodes’ outputs. These transformations allow your ANN to have better performance.
- The choice of activation function on each hidden layer is dependent on your problem.
Feed-forward is a critical concept to understand. With that being said, I have made a list here that summarizes some of the key points and concepts. Note, even though I have made a list of key concepts, I highly encourage you to actually go through the feed-forward section if you haven’t already.
- Feed-forward refers to data being sent to through the ANN (data -> input -> hidden layers -> output layer)
- The feed-forward process is usually used in making predictions and in back-propagation (we will talk about this concept in a future article).
- At each node of the ANN, the node performs a weighted summation & puts that sum through an activation function. Mathematically, it looks like this: Output of Node = activation function(Weights^T * Inputs + bias).
- Activation functions transform the weighted summation such that it may be appropriately transformed for the context of the problem.
If you made it this far through the article, I thank you. It really means a lot to me to see people reading my content and learning something new. Let me know what your thoughts regarding this article are in the comments below.
About the Author
I am an undergraduate student @ Rutgers University-New Brunswick, who is pursuing the Computer Science & Cognitive Science majors. Furthermore, I am pursuing a minor in Business Administration and a certificate in Data Science. I have been applying machine learning for a little over a year, and recently I dove my feet into deep learning. I am very much intrigued by the power of artificial intelligence and can’t wait to share my learnings with the community! Feel Free to contact me via LinkedIn or email me at firstname.lastname@example.org.