Deep Learning

Neural Networks Part 1: Logistic Regression

Single neuron of a Neural network

Rakesh Malviya

Published in

Walmart Global Tech Blog

3 min readSep 30, 2019

Required Learning: Linear regression basics link

We are starting from basic unit of Neural networks — the single activation neuron. A Neural network with single neuron is same as logistic regression. Therefore a neural network can be considered as a networked set of logistic regression units.

Note: Above is true for neural network which has only Sigmoid activations function, since logistic regression uses Sigmoid function. Don’t worry this will be clear in subsequent blogs

Establish Notations For Future Use

Note that the superscript “(i)” in the notation is simply an index into the training set, and has nothing to do with exponentiation.

Fig: Single Neuron (Created using inkscape)

Note: We can use a better loss function for logistic regression, but we are using least square error for simplicity

Derivatives

Gradient Descent

Note: J is our loss function and j is for indexing

Summing of individual gradients on training examples makes gradient update smoother
Without averaging the learning rate depends on the size of training data m or batch size
With averaging the gradient magnitude is independent of the batch size. This allows comparison when using different batch sizes or training data size m.

Training Steps:

Code snippet of above steps:

#Accumulate gradient with respect to bias and weights
    grad_bias = 0
    grad_w = np.zeros(len(W))
    for i in range(X_train.shape[0]):        
        grad_bias += (YP[i] - y_train[i])*(YP[i])*(1-YP[i]) #dJ/db
        for j in range(len(W)):
            #dJ/dW_j
            grad_w[j] += (YP[i] - y_train[i])*(YP[i])*(1-YP[i])*(X_train[i][j])
        
    #Update bias
    bias = bias - grad_bias*lr/X_train.shape[0]

Stochastic Gradient Descent, SGD

When training data size mm is large, we choose m′<m of batch size. We divide our training data into batches of size m′. We update weights and bias for each batch as follows: