Over the past few years, Neural Networks have predominated the field of classification problems. Highly effective results have been achieved in domains of computer vision and language and speech processing after implementing deep neural networks.
‘‘But can Deep Learning,precisely Neural Networks be deployed in place of fundamental machine learning models??’’. “Are Artificial Neural Networks capable of solving problems pertaining to linear and logistic regression analysis??”
I have tried to address this question in this article and included an illustration involving implementation of Boolean logic using Neural Networks.So let’s get started.
Neural Networks — — — Revisiting Basics
Inspired from the working of Human Nervous System the idea of Artificial Neural Networks came into existent around 1960. Drawing inferences from the works of Warren McCulloch and Walter Pitts, Frank Rosenblatt put forward the idea of the fundamental Perceptron.
The Perceptron is the basic unit of all forms of Neural Networks,(such as Feed-Forward Neural Networks,CNN,RNN,GAN’s etc.).Let us deal with its computational working.
The basic Perceptron takes multiple binary inputs and provides a single binary output after a simple mathematical computation is involved.
Here the inputs are X1,X2,X3. For Computational purposes Rosenblatt had provided WEIGHTS to each input.These weights were real numbers expressing the importance of the corresponding input to the concerned output.He proposed the idea of providing a THRESHOLD to the perceptron.Thus the Perceptron took the inputs in the form of a WEIGHTED SUM ( ∑ WiXi ) and provided a functional value of that weighted sum as its output only when the this computed functional value had crossed the threshold. The Following image will ease the understanding.
The Function f() as shown above is said to be the Activation Function.We Will get to see about it in our next article.But for now let us develop a brief perception regarding Activation Functions with the help of a common example.
Sigmoid Activation (often deployed for binary as well as multi class classification)
Sigmoid takes in an input which is a real number(might be an integer or a floating variable) and provides an output which lies between 0 and 1).Mainly implemented at the output layer of a multi class or binary classification network.
Thus Considering the output from the neurons of the penultimate layer as an input to the last layer as X we have the output from the network in the form of its functional value f(X).
and when f() is the sigmoid function then,0≤f(X)≤1 |
Now our discussion centers around the idea on the consequences of not having an activation function in the last layer or on the very final neuron.Thus,
What happens when we do not use an activation function at the output layer?
The Final Neuron will provide the input into it as its output.That is without the activation function the computation of the functional value of the input is not being done.Instead,the input is provided at the output. A simple network has been provided to illustrate the fact.
Thus Regression, (Linear Regression to be specific) which aims at Computing a Weighted equation of all features can be very well realized from a Neural Network.The concept being , the final layer is to be coded without providing any activation such as sigmoid or softmax to it.
We will now see how Boolean logic in the form of digital logic gates can be implementated using this concept. Let us see the intuition behind it.Now to brief about what digital logic gates are, well,these are basic circuits,which take digital signals as their input and based on a particular circuitry provide a signal at the output.
But in this case we do need to assume that Sigmoid activation function has been used at the final layer. The actiavtion is given in the form of g(z),z is the input from the penultimate layer(basically the weighted sum of the concerned).
Problem Statement: We are to implement an OR gate using Neural Networks. Let us consider the following networks with predefined weights.
We have a Neural Net without any hidden layer. The Bias Unit Is associated with a negative weight. X1 and X2 are two random variables (Boolean in type) and can assume two values 0 and 1. As mentioned earlier g(z) is the functional value of z the activation function being sigmoid.
Thus let us take a look at the output.We are considering binary values for X1 and X2 only.As shown from the graph of sigmoid it is clear that for large values at its input the output will be 1.For smaller values(values which are essentially negative the output is supposed to be 0)
This is quite like the Truth Table of an OR gate.Similarly And Gates and XOR and XNOR gates can be computed. For XOR and XNOR gates however a hidden layer has to be implemented.