Deep Learning
Neural Networks Part 1: Logistic Regression
Single neuron of a Neural network
Required Learning: Linear regression basics link
We are starting from basic unit of Neural networks — the single activation neuron. A Neural network with single neuron is same as logistic regression. Therefore a neural network can be considered as a networked set of logistic regression units.
Note: Above is true for neural network which has only Sigmoid activations function, since logistic regression uses Sigmoid function. Don’t worry this will be clear in subsequent blogs
Establish Notations For Future Use
Note that the superscript “(i)” in the notation is simply an index into the training set, and has nothing to do with exponentiation.
Note: We can use a better loss function for logistic regression, but we are using least square error for simplicity
Derivatives
Gradient Descent
Note: J is our loss function and j is for indexing
- Summing of individual gradients on training examples makes gradient update smoother
- Without averaging the learning rate depends on the size of training data m or batch size
- With averaging the gradient magnitude is independent of the batch size. This allows comparison when using different batch sizes or training data size m.
Training Steps:
Code snippet of above steps:
#Accumulate gradient with respect to bias and weights
grad_bias = 0
grad_w = np.zeros(len(W))
for i in range(X_train.shape[0]):
grad_bias += (YP[i] - y_train[i])*(YP[i])*(1-YP[i]) #dJ/db
for j in range(len(W)):
#dJ/dW_j
grad_w[j] += (YP[i] - y_train[i])*(YP[i])*(1-YP[i])*(X_train[i][j])
#Update bias
bias = bias - grad_bias*lr/X_train.shape[0]
Stochastic Gradient Descent, SGD
When training data size mm is large, we choose m′<m of batch size. We divide our training data into batches of size m′. We update weights and bias for each batch as follows:
Advantages Of SGD
- Much faster than normal gradient descent
- Better choice when whole training data cannot fit into the RAM (available memory) of the system
Code
Here is the python implementation of the Logistic regression.