Mathematics and Vectorization behind Neural Network

Akhilesh Kapse
Analytics Vidhya
Published in
4 min readMay 12, 2020

In this era of having many high-level, specialized libraries and frameworks such as Keras, TensorFlow or PyTorch , we do not need to constantly worry about the size of our weights matrices or remembers formula for the derivative of activation function we decided to use.This post aims to discuss what a neural network is and how we represent it in a machine learning model. Subsequent posts will cover more advanced topics such as training and optimizing a model, but I’ve found it’s helpful to first have a solid understanding of what it is we’re actually building and a comfort with respect to the matrix representation we’ll use.

A computational model of a neuron

In first step we will understand how single neuron works as a logistic function and then move towards deeper side to compute more complex functions using multi/deep layer network framework.

In logistic regression, we composed a linear model z(x) with the logistic function g(z) to form our predictor. This linear model was a combination of feature inputs xi and weights wi.

z(x)=w1x1+w2x2+w3x3+w4x4+b=wTx+b

Let’s try to visualize it.

The first layer or final layer (in this case)contains a node for each value in our input feature vector. These values are scaled by their corresponding weight, wi, and added together along with a bias term, b. The bias term allows us to build linear models that aren’t fixed at the origin.

After computing z(x) ,it then pass to some non-linear activation function having some threshold value(θ).If the linear combination of inputs and weights (z(x)) is higher than the threshold, the neuron fires, and if the combination (z(x)) is less than the threshold it doesn’t fire.

Loss function

The basic source of information on the progress of the learning process is the value of the loss function. Generally speaking, the loss function is designed to show how far we are from the ‘ideal’ solution. In our case we used binary crossentropy , but depending on the problem we are dealing with different functions can be applied.

Main aim of Back Propagation is to modify Weights and Biases of Network which then gives local/global optimum value of Cost function of respective model.This is done with the help of Computational Graph Method as shown below.

Lets see how the whole Process works to update the Weights and biases of neural network model.

Building a network of neurons

The previous model is only capable of binary classification; however, recall that we can perform multi-class classification by building a collection of logistic regression models. Let’s extend our “network” to represent this.

Here, we’ve built three distinct logistic regression models, each with their own set of parameters. Take a moment to make sure you understand this matrix representation. (This is why matrix multiplication is listed as a prerequisite.) It’s rather convenient that we can leverage matrix operations as it allows us to perform these calculations quickly and efficiently.

The above example displays the case for multi-class classification on a single example, but we can also extend our input matrix to classify a collection of examples. This is not simply useful, but necessary for our optimization algorithm (in a later post) to learn from all of the examples in an efficient manner when finding the best parameters.

Matrix representation

Let n[l] represent the number of units in layer l. For a given layer, we’ll have a weights matrix W[l]of shape (n[l],n[l−1]) and a bias vector of shape (n[l],1).

The activations of a given layer will be a matrix of shape (n[l],m) where m represents the number of observations being fed through the network

Feeling like you’ve got a grasp?

Thank you for reading.

--

--

Akhilesh Kapse
Analytics Vidhya

Data Scientist | Talks about Data-Science, ML and deep learning.