Circles and Arrows

Week 8: AI6 Ilorin

Published in

ai6-ilorin

5 min readMar 11, 2020

Concept: miniature men/women ardently adjusting weights, inside the hidden layers of a universe called Perceptron.

Linear and Logistic Regressions have always felt familiar. Whether it’s a line of best fit, or a scrawny-looking curve, I’ve seen it somewhere before in O-level Math. And so, it was slightly unsettling, as well as fascinating, seeing a neural network for the first time.

The model, inspired by biological neurons in the human brain, is arguably one of the most unattractive things I’ve ever seen. But fascinating, still. The neural network is personal proof that no matter how much I study to show myself approved in class, there will always be a million and one things I do not know or understand. And that’s okay.

I feel like Samwell Tarly lost in the Citadel for the first time.

A complicated highway system of nerves, arranged in patterns, connects your brain to the rest of your body, so communication can occur in split seconds. A neural network (deep learning method), mirrors this mechanism, hence its name.

What is a Neural Network?

Logistic Regressions, a good classification algorithm is a linear model and cannot capture a more complex and realistic non-linear relationship with respect to the data’s input features.

Evolution took place with Regression and birthed a complex system that can model complex structures from a dataset, a Neural Network.

A neural network is a network of “circles” and “arrows” that utilizes an iterative learning process, refining its parameters randomly and repeatedly to produce desired outputs.

Neural networks are needed for scenarios when the relationship between the variables (independent and dependent) are non-linear and complex. In normal Regression models, a line or two lines or anything remotely similar can split our data in ways that clearly define the relationship between our x(inputs) and our y(output). Neural networks are effective however, in the context of Classification, useful in finding patterns that are too complex to be taught to a machine (e.g in trying to detect whether an image is a woman or a cat).

The image above is an example of a 2-layered neural network. Not that unattractive.

Each circle is called a neuron or node, and all neurons are vertically arranged in columns called a layer. From the image, there are visibly 3 layers, the input layer(red), the hidden layer(blue) and the output layer(green).

The input layer accepts the input features to be fed into the neural network. The hidden layer is the one between the input and output layers, where all of the computation takes place. And the output layer produces results learned by the neural network.

The Perceptron is a mathematical model of a biological neuron. It is the fundamental unit of a neural network.

The arrows from each circle are channels from one node to the other, and they have numerical value or weight, just like each node. The overall sum of the products of each circle and its corresponding arrow (beginning from the input layer) determines the output of the last layer.

This is a forward propagation. A basic idea. The Neural network receives an input x and predicts an output y, by repeatedly multiplying the numerical value of each circle and its arrow, and then summing everything all up for the prediction at the output layer. The result at this layer (output) is then compared with the real result to get the error.

Non-linearity In Neural Networks

Non-linearity in neural networks refer to the fact that the output at any unit cannot be reproduced from a linear function of the input.

The dot product of the weight vector (arrow), and the input vector (circle) is a linear operation, and multiple equations or operations that multiply node by weight and sum it all up to produce the output value is still just a composition of linear operations, hereby making the whole system seem nothing short of a linear regression.

But a neural network is employed for its complexity and its ‘ability’ to find patterns and relations in non-linear context, yet the equations are all linear…

Enter Activation Function

Activation functions introduce a non-linear property to a neural network. They are applied to the dot product of the nodes and the weights.

Activation functions such as the Sigmoid function are the very source of non- linearity, which is basically what distinguishes the neural network from other linear transformations.

After a forward propagation — feeding the net some input features and getting the output, the resulting output or answer is evaluated. The aim is to compare if the predicted output is, and by what measure, similar to the actual output:

In image processing, does the neural network see a dog when you show it a man?

Evaluation of the predicted output against the expected output is achieved using a cost-function ( a metric that tells how good you’re doing ). Based on the feedback from the machine learning critic, the model realizes its ‘error’ and seeks redress by adjusting parameters to get as close as possible to the expected output, using the backpropagation algorithm.

Backpropagation, or backward propagation is an algorithm used to train a neural network through a chain rule. What it does is to minimize cost function by adjusting parameters.

Overfitting/Underfitting In Neural Networks

One of the most common problems gurus encounter while training neural networks is overfitting. It occurs primarily when there are too many parameters in a model, thereby contributing to the complexity of the model.

The aim is to reduce the complexity of the model by Regularization:

By randomly removing nodes or hidden layers, the model capacity reduces, hence reducing overfitting (Dropout).

Early stopping, another regularization technique involves training the model with an iterative method; when the difference between the train and test error becomes large, training stops early, ultimately stopping the iterative process. Early stopping helps us know how many iterations can be run before the model begins to overfit.

Data Augmentation

Increasing the size of the data (images) by augmenting it through flipping, scaling or rotation can also reduce overfitting.

Underfitting occurs when the model is not complex enough. Increase in model capacity by adding more neurons and more hidden layers will address underfitting.

Conclusion

Week 8 was all about the fresh neural network, different from all the ‘cheap’ graphs we were used to. The discussion spanned across the definition of a neural network to the many applications of it, like speech/character recognition, language generation, text classification and so on.

I bet it would be a wonderful experience working long-term on this in the future.

Circles and Arrows

Week 8: AI6 Ilorin

Written by The Scenic Route