Artificial neural networks are one of the most widely used methods in machine learning. And one of the most interesting things about a neural network is the way it learns about the data it’s been trained on. It first starts by learning simple patterns in the data and then proceeds to learn more complex attributes.

I decided to write this article after taking a class on neural networks and reading lots of articles about it. Even though I understood the structure of a neural network, and the process involved in adjusting the weights required to make proper predictions, it wasn’t still clear to me why it worked the way it did. **I wanted to be able to explain why and how the foremost layers in a network are able to discover simple attributes from a data set, and layers closer to the output layer can learn more complex attributes(***which are combinations of attributes learnt from previous layers***)**. …

In machine learning, the basic routine is as follows, Feeding a data-set to a machine learning program, The program derives a function that models that data-set, and then the program can then start to make predictions based on its derived function.

It turns out that one of the major goals of the machine learning program while trying to come up with that function is to find a way to optimize that function to model the given data-set as best as possible.

And there are several known methods that could be used during the optimization process. …