Neural Networks Part - 2
In-depth explanation with some math.
This article is one of the series of articles on neural networks.
In the previous part, we have built a small neural network of 4 layers. It takes 3 inputs and gives 1 output.
We’ll also need some training data in the form of (x,y). Let’s consider a single training example as x = [3,2,4] and y = 8. You can consider this example as a house with 3 rooms in a 2 year old building on the 4th floor will cost $8K. Obviously, while training an actual neural network you’ll need much more data.
In this part, we’re going to discuss the loss function.
Let’s start by imagining a scenario where you’re learning to shoot a basketball through a hoop. In your first try, you miss the hoop by shooting far too left. So in the next attempt, you aim to a little right of your previous attempt. This time you hit the top of the board and miss it. So you know that next time you have to hit at the same angle but only a little slower. This is a great analogy to the weights(and biases) of the neural network. You change the weights according to your error in previous prediction similar to how you change shooting angle, distance, and speed according to your previous shot.
In the back of your mind, you’re calculating the amount of which you have missed the target and adjust your next shot accordingly.
We have to imitate this process in order to know how far is our prediction from the actual answer. This is the job of the loss function. As we progress with our training of a model, our aim is to minimize this loss function as much as possible.
Note: Loss function is also sometimes referred to as cost function.
Choose a loss function.
The choice of a loss function depends on the type of problem. Like our house price problem is a type of regression problem. Here we’re trying to predict a numerical real value. The most common loss function used in regression problems is squared error loss function. It is defined as the square of a difference between predicted output and actual output.
Subtraction because we want to know the distance of our prediction from actual output. Square because,
- We want a positive number as a distance.
- Squaring will penalize large distances more than small distances.
There are many types of more complicated loss functions available depending upon the use case. The detailed list can be found here.
In the next part, we’ll have a look at the activation functions.