To Learn Deep Learning | Day 1

the basic of nerual networks

quoniammm
quoniammm
Feb 23, 2017 · 4 min read

The nerual networks are a very powerful machine learning algorithm used in most state of the art projects. Such as facial recognition, voice recognition, playing chess and self-driving cars. And we call it nerual networks because these nodes resemble neruons in the brain. These nodes will take in input data, process that information, and finally, produce an output in the form of a decision!

These individual nodes are called perceptrons or neurons, and they are the basic unit of a neural network. Each one looks at input data and decides how to categorize that data.Let’s zoom in even further and look at how a single perceptron processes input data.

Weights

When we initialize a neural network, we don’t know what information will be most important in making a decision. It’s up to the neural network to learn for itself which data is most important and adjust how it considers that data.It does this with something called weights.A higher weight means the neural network considers that input more important than other inputs, and lower weight means that the data is considered less important.These weights start out as random values, we use some algorithm(gradient descent, etc.)to change them.This is called training the neural network.

Summing the Input Data

In the next step, the weighted input data is summed up to produce a single value, that will help determine the final output.(PS:Maybe sometimes we need add a bias to change the input)

Calculating the Output with an Activation Function

Finally, the result of the perceptron’s summation is turned into an output signal!

“Learn”

Then the neural network starts to learn! Initially, the weights and bias are assigned a random value, and then they are updated using a learning algorithm like gradient descent. The weights and biases change so that the next training example is more accurately categorized, and patterns in data are “learned” by the neural network.

The cool part about this architecture, and what makes neural networks possible, is that the activation function f(h).The Sigmoid function is often used.(β = 1)

How to learn weights

We want the network to make predictions as close as possible to the real values. To measure this, we need a metric of how wrong the predictions are, the error. A common metric is the mean of the squared errors (MSE).You will take the sum over all output units j and another sum over all data points μ.

We want the network’s prediction error to be as small as possible and the weights are the knobs we can use to make that happen. Our goal is to find weights wij​​ that minimize MSE. To do this with a neural network, typically you’d use gradient descent.

With gradient descent, we take multiple small steps towards our goal. In this case, we want to change the weights in steps that reduce the error. Continuing the analogy, the error is our mountain and we want to get to the bottom. Since the fastest way down a mountain is in the steepest direction, the steps taken should be in the direction that minimizes the error the most. We can find this direction by calculating the gradient of MSE.

For simplicity, we look at the situation when there is a output unit.Now the MSE is:

How we update the weight.We use the following formula.

Alpha is called learning rate.Then we update the value of weights until the value of MSE goes down down to the smallest.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade