# A Visual Introduction to Neural Networks

Neural Networks, which are found in a variety of flavors and types, are state of the art for classification problems currently. Convolution Neural Nets, Recurrent Neural Nets and lately the Generative Adversarial Neural Nets have also proven to be of great use. Approaches like logistic regression, SVM, decision trees etc are also used for classification. However, transitioning into the world of neural networks from these is often full of abstractions.
The intent of writing this blog is to simplify these abstractions. Through the use of experiments and graphs, the working of neural networks is depicted via this post. Some of the questions one can expect this to answer are :

1. What can logistic regression do?
2. What can logistic regression not do?
3. How do we transition from logistic regression to neural nets?
4. Change in decision boundaries with neural nets
5. How convergence of neural net is affected by change in activation and other hyper-parameters

Do read the captions for a better understanding.

We have seen from the above figure that Logistic Regression can classify these points as per the color (red and blue).
Let’s take a different problem. The problem in the figure below is a nonlinear problem, meaning we can’t use a line in this case (hyper-plane in general) to simply classify the points. We can also see from the figure below, that logistic regression can’t solve this problem, as it is trying to separate these using a line.

Logistic regression can be seen as a single layer neural network (with 1 neuron) and ‘sigmoid’ as the activation function. Let’s see if a neural net with a ‘sigmoid’ as activation and one neuron can solve this problem. This is done using MLPClassifier of Python’s Scikit-learn library and the implementation can be found here
Following three figures depict a single neuron based neural trying to solve the problem.

We observe that a single neuron based neural net is, as expected, giving a linear decision boundary which irrespective of the configuration (activation function, learning rate etc) is not able to solve a nonlinear problem.
Let’s see what happens when we use two neurons.

We observe that we get two boundaries now and the error of classification is reduced (as judged by the number of points in right category). It can be interpreted that points in the yellow region are blue and other regions are red. This solution still has some errors, but the solution has a lower error than the case of a single neuron. We can also see that two decision boundaries from each of the neuron are combined here to make a decision.
Let’s see what happens when we increase the number of neurons to 2, 3, 4, 6, 7 and 30 respectively. ( The code to plot the decision boundaries can be found here)

We observe that, as we increase the number of neurons, the model is able to classify the points more accurately. The decision boundary is complex due to it being a non linear combination (via activation functions) of individual decision boundaries. On an abstract level, it can be viewed as multiple classifiers combining in a nonlinear manner to fetch the nonlinear decision plane. It can be concluded that when data is non linear, a layer of multiple neurons with non linear activation function can classify it. This sample problem is quite small. In case of a higher dimension and more complex problem, more complex architectures can come into play.
In the above figure, a nonlinear activation function such as ‘relu’ was used. This was introducing the ‘nonlinearity’ by combining various planes in a nonlinear manner. Lets us see what happens when we use a linear activation function

Linear activation functions when combined using “Wx+b”, which is another linear function, ultimately gives a linear decision plane again. Hence neural net must have a nonlinear activation else there is no point increasing layers and neurons.
Let us have a look at how a model converges with a 3 class- classification problem and 1 layer of 6 neurons. The 3 classes here are dark blue, light blue and red and the decision spaces found are yellow, blue and purple respectively.

Lets now take another problem to know a little more about neural net functioning.

The problem and outcomes are simulated from Tensorflow playground which is a very good tool for visualizing how neural nets work.

Lets compare the above now with a case where we have multiple layers.

Above shows, that the problem was solvable by simply increasing neurons in one layer, but it solved faster when using multiple layers. Multiple layers also enable model to form higher level features (Taking input first level features and processing on top of those). A good example of this behavior is found in CNN for image classification, where starting layers find basic shapes such as lines, curves etc but latter layers find properties on top of these such as a face, hand etc.
Let’s learn something more with another experiment. The figure shown above uses ‘sigmoid’ as the activation function. It is well known that sigmoid has a problem of vanishing gradient. This means as more layers come into the picture, the gradient to that is required to calculate to update weights gradually tends to zero. ‘Relu’ is generally recommended as an activation function to handle this problem. Let’s repeat the above figure but using a ‘relu’ this time to verify this fact

It is clearly evident that using relu solves the problem to near perfection and in half the time. The losses are close to zero. Also, using 1 layer with 8 relu would not have fetched the same result as multiple layers are exploiting the higher level feature property that it extracts.

The code for such experimentation can be found here.

Adding a gif animation to show how the decision boundary converges

This blog covers a visual approach for neural nets. This is my first attempt at writing a blog. Stay tuned for upcoming posts on other topic and applications related to machine learning. Feel free to reach out to me for questions/feedback at shikharcic23@gmail.com or

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.