A Visual Introduction to Neural Networks

Neural Networks, which are found in a variety of flavors and types, are state of the art for classification problems currently. Convolution Neural Nets, Recurrent Neural Nets and lately the Generative Adversarial Neural Nets have also proven to be of great use. Approaches like logistic regression, SVM, decision trees etc are also used for classification. However, transitioning into the world of neural networks from these is often full of abstractions. 
The intent of writing this blog is to simplify these abstractions. Through the use of experiments and graphs, the working of neural networks is depicted via this post. Some of the questions one can expect this to answer are :

1. What can logistic regression do?
2. What can logistic regression not do?
3. How do we transition from logistic regression to neural nets?
4. Change in decision boundaries with neural nets
5. How convergence of neural net is affected by change in activation and other hyper-parameters

Do read the captions for a better understanding.

Sample classification problem to solve. We need a decision plane to separate red and blue points
Logistic Regression can solve this problem and give the decision plane as shown. The code for the solution is available here. Process of obtaining in the line is explained in the code

We have seen from the above figure that Logistic Regression can classify these points as per the color (red and blue).
Let’s take a different problem. The problem in the figure below is a nonlinear problem, meaning we can’t use a line in this case (hyper-plane in general) to simply classify the points. We can also see from the figure below, that logistic regression can’t solve this problem, as it is trying to separate these using a line.

A non linear classification problem
Logistic Regression unable to solve the problem

Logistic regression can be seen as a single layer neural network (with 1 neuron) and ‘sigmoid’ as the activation function. Let’s see if a neural net with a ‘sigmoid’ as activation and one neuron can solve this problem. This is done using MLPClassifier of Python’s Scikit-learn library and the implementation can be found here
Following three figures depict a single neuron based neural trying to solve the problem.

mlp = MLPClassifier(hidden_layer_sizes=(1),max_iter=500000,activation=’logistic’,learning_rate_init=0.01)
mlp = MLPClassifier(hidden_layer_sizes=(1),max_iter=500000,activation=’identity’,learning_rate_init=0.01)
mlp = MLPClassifier(hidden_layer_sizes=(1),max_iter=500000,activation=’relu’,learning_rate_init=0.01)

We observe that a single neuron based neural net is, as expected, giving a linear decision boundary which irrespective of the configuration (activation function, learning rate etc) is not able to solve a nonlinear problem.
Let’s see what happens when we use two neurons.

Non Linear problem with 1 layer and 2 neurons

We observe that we get two boundaries now and the error of classification is reduced (as judged by the number of points in right category). It can be interpreted that points in the yellow region are blue and other regions are red. This solution still has some errors, but the solution has a lower error than the case of a single neuron. We can also see that two decision boundaries from each of the neuron are combined here to make a decision.
Let’s see what happens when we increase the number of neurons to 2, 3, 4, 6, 7 and 30 respectively. ( The code to plot the decision boundaries can be found here)

Decision plane with 3 neurons : lower error than previous case.
Decision plane with 6 neurons. Perfect classification with 100 % accuracy. Model now has learnt to combine 6 decisions to form a final decision. Points in yellow region are classified as blue and rest are classified as red
Decision plane with 8 neurons
Decision plane with 30 neurons

We observe that, as we increase the number of neurons, the model is able to classify the points more accurately. The decision boundary is complex due to it being a non linear combination (via activation functions) of individual decision boundaries. On an abstract level, it can be viewed as multiple classifiers combining in a nonlinear manner to fetch the nonlinear decision plane. It can be concluded that when data is non linear, a layer of multiple neurons with non linear activation function can classify it. This sample problem is quite small. In case of a higher dimension and more complex problem, more complex architectures can come into play.
In the above figure, a nonlinear activation function such as ‘relu’ was used. This was introducing the ‘nonlinearity’ by combining various planes in a nonlinear manner. Lets us see what happens when we use a linear activation function

30 neurons with linear activation function

Linear activation functions when combined using “Wx+b”, which is another linear function, ultimately gives a linear decision plane again. Hence neural net must have a nonlinear activation else there is no point increasing layers and neurons.
Let us have a look at how a model converges with a 3 class- classification problem and 1 layer of 6 neurons. The 3 classes here are dark blue, light blue and red and the decision spaces found are yellow, blue and purple respectively.

Decision boundary of a non linear 3-class classification problem

Lets now take another problem to know a little more about neural net functioning.

The problem and outcomes are simulated from Tensorflow playground which is a very good tool for visualizing how neural nets work.

A more complex non linear problem. We need to classify blue and orange points
1 neuron. As expected, only a linear boundary is available and classification is poor
Using 3 neurons. Now the boundary is curved. Classification is somewhat better but far from perfect. Average loss of 0.473 noted. Classification is judged by establishing that blue points must have a blue backgroundand orange must have a orange background
5 neurons used and test and train losses are 0.396 and 0.259. So we get a better classification . Model was allowed to run epochs till 1 minute
Using 5 neurons, classification improves further
8 neurons. A better and faster classification. The losses obtained after 1 minute are depicted

Lets compare the above now with a case where we have multiple layers.

We observe by using a 3 layer net with 4, 4, 2 neurons in each layer, we get a better and faster classification

Above shows, that the problem was solvable by simply increasing neurons in one layer, but it solved faster when using multiple layers. Multiple layers also enable model to form higher level features (Taking input first level features and processing on top of those). A good example of this behavior is found in CNN for image classification, where starting layers find basic shapes such as lines, curves etc but latter layers find properties on top of these such as a face, hand etc.
Let’s learn something more with another experiment. The figure shown above uses ‘sigmoid’ as the activation function. It is well known that sigmoid has a problem of vanishing gradient. This means as more layers come into the picture, the gradient to that is required to calculate to update weights gradually tends to zero. ‘Relu’ is generally recommended as an activation function to handle this problem. Let’s repeat the above figure but using a ‘relu’ this time to verify this fact

Non linear problem decision boundary using 3 layers (4,4,2 neurons respectively) and ‘relu’ as activation

It is clearly evident that using relu solves the problem to near perfection and in half the time. The losses are close to zero. Also, using 1 layer with 8 relu would not have fetched the same result as multiple layers are exploiting the higher level feature property that it extracts.

The code for such experimentation can be found here.

Adding a gif animation to show how the decision boundary converges

Animation depicting boundary formation with 3 layers and ‘relu’ activation

This blog covers a visual approach for neural nets. This is my first attempt at writing a blog. Stay tuned for upcoming posts on other topic and applications related to machine learning. Feel free to reach out to me for questions/feedback at shikharcic23@gmail.com or