Deep Learning

Neural Networks Part 1: Logistic Regression

Single neuron of a Neural network

Rakesh Malviya
Walmart Global Tech Blog

--

Required Learning: Linear regression basics link

We are starting from basic unit of Neural networks — the single activation neuron. A Neural network with single neuron is same as logistic regression. Therefore a neural network can be considered as a networked set of logistic regression units.

Note: Above is true for neural network which has only Sigmoid activations function, since logistic regression uses Sigmoid function. Don’t worry this will be clear in subsequent blogs

Establish Notations For Future Use

Note that the superscript “(i)” in the notation is simply an index into the training set, and has nothing to do with exponentiation.

Fig: Single Neuron (Created using inkscape)

Note: We can use a better loss function for logistic regression, but we are using least square error for simplicity

Derivatives

Gradient Descent

Note: J is our loss function and j is for indexing

  1. Summing of individual gradients on training examples makes gradient update smoother
  2. Without averaging the learning rate depends on the size of training data m or batch size
  3. With averaging the gradient magnitude is independent of the batch size. This allows comparison when using different batch sizes or training data size m.

Training Steps:

Code snippet of above steps:

#Accumulate gradient with respect to bias and weights
grad_bias = 0
grad_w = np.zeros(len(W))
for i in range(X_train.shape[0]):
grad_bias += (YP[i] - y_train[i])*(YP[i])*(1-YP[i]) #dJ/db
for j in range(len(W)):
#dJ/dW_j
grad_w[j] += (YP[i] - y_train[i])*(YP[i])*(1-YP[i])*(X_train[i][j])

#Update bias
bias = bias - grad_bias*lr/X_train.shape[0]

Stochastic Gradient Descent, SGD

When training data size mm is large, we choose m′<m of batch size. We divide our training data into batches of size m′. We update weights and bias for each batch as follows:

Advantages Of SGD

  1. Much faster than normal gradient descent
  2. Better choice when whole training data cannot fit into the RAM (available memory) of the system

Code

Here is the python implementation of the Logistic regression.

References:

  1. http://cs229.stanford.edu/notes

--

--