What is a Neural Network?

Kaushik Mani
DataDrivenInvestor

--

Recently, the words Artificial Intelligence, Machine Learning, Deep Learning, Neural Networks are spreading like wildfire. So, what are these terms?

Wikipedia defines “Artificial Intelligence” as intelligence demonstrated by machines. In other words, it is the study of ability of devices to interpret external data, learn from that data and use that data to achieve its goals.

“Machine Learning” is a subset of Artificial Intelligence, in which machines learn to complete a task, without being explicitly programmed to do so. The difference between traditional programming and machine learning is that in traditional programming, you pass an input and a set of rules to the machine, and the machine gives you an output, while in Machine Learning you pass an input and the output, and the machine learns the set of rules which formed the output, so that you can use those rules to get an output in future automatically.

“Deep Learning” is the subset of Machine Learning that is concerned with algorithms inspired by the structure and function of the brain called Neural Networks. We can think about it in terms of writing: If you had ten people write the same word, the word would look very different for each person, right from cursive to print, or sloppy to neat. The human brain has no problem understanding that it’s all the same, but how would a normal computer system know that? In other words, how could we make a machine act intelligently like the human brain? That brings us to Neural Networks. Before we try to learn what a Neural Network is, let us look into its most basic building block, i.e a Neuron.

What is a Neuron?

Like in a human brain, the basic building block of a Neural Network is a Neuron. Its functionality is similar to a human brain, i.e, it takes in some inputs and fires an output. Each neuron is a small computing unit that takes a set of real valued numbers as input, performs some computation on them, and produces a single output value.

To understand the working of neuron, let us first understand the meaning of a few terms.

Weight : Every input(x) to a neuron has an associated weight(w), which is assigned on the basis of its relative importance to other inputs.

The way a neuron works is, if the weighted sum of inputs is greater than a specific threshold, it would give an output 1, otherwise an output 0. This is the mathematical model of a neuron, also known as the Perceptron.

Every neural unit takes in a weighted sum of its inputs, with an additional term in the sum called a Bias.

Bias: Bias is a constant which is used to adjust the output along with the weighted sum of inputs, so that the model can best fit for the given data.

It is easier to represent the weighted sum of inputs using vector notations, so we defined weighted sum z in terms of weight vector w, input vector x , and a bias value b.

The output(y) of the neuron is a function f of the weighted sum of inputs z. The function f is non linear and is called the Activation Function.

Activation Function: The purpose of activation function is to introduce non-linearity into the output of neuron. It takes a single number, and performs some mathematical operation on it. There are several activation functions used in practice:

  1. Sigmoid: It takes a real valued input and maps the output into the range [0,1] .

2. Tanh: It is very similar to the sigmoid function, but it maps the output into the range [-1,1].

3. ReLU: It is the most commonly used activation function, called the Rectfied Linear Unit(ReLU). Its value is equal to x when x is positive, and 0 otherwise.

Every activation function introduces some property that makes it more advantageous than just a linear weighted sum of inputs. For example, ReLU is very close to linear and for very high values of z, the values of y are more varied than a sigmoid/tanh activation. Sigmoid/Tanh activation maps any outliers towards the mean value. In general practice, ReLU has been found to be performing better than sigmoid or tanh activations.

Neural Networks

A neural network is composed of layers, which is a collection of neurons, with connections between different layers. These layers transform data by first calculating the weighted sum of inputs and then normalizing it using the activation functions assigned to the neurons.

The leftmost layer in a Neural Network is called the input layer, and the rightmost layer is called the output layer. The layers between the input and the output, are called the hidden layers. Any Neural Network has 1 input layer and 1 output layer. However, the number of hidden layers differ between different networks depending on the complexity of the problem. Also, each hidden layer can have its own activation function.

Any neural network with two or more than 2 hidden layers is called a Deep Neural Network. A neural network makes accurate predictions by learning the weights for each of the neurons at every layer. The algorithm through which they learn is called as “Back Propagation”, which I will cover in a later post.

With this, I hope you will have a basic understanding about a Neural Network, its basic structure and various jargons associated with it. The advancement in this area is increasing at a rapid pace, and it would be good to start learning about it now, because the hype is real!

--

--