Logistic Regression : All You wanna Know

Published in

Analytics Vidhya

3 min readJan 15, 2021

Complete implementation from scratch

Logistic Regression is used to model probability of a certain class so that it can be assigned value of 0 or 1. It is a statistical and supervised learning model. It is a go-to method for binary classification.

For instance you want to predict whether is person is having diabetes or not ! OR Let’s say you want to know the survival rate from any dataset OR Any mail is a spam (1) or not (0).

So basically it is a classification technique. Here we will see Binary Classification.

We use Sigmoid Function in Logistic Regression as we want our value to be between 0 and 1 which looks like in below image.

Hypothesis Function

Combining both will give us →

This function will always give us value between 0 and 1 which tells us the probability of that point. As we come near to predicted line our predictions become less confident ( as they are confused). We produce a decision boundary which is if value is greater than 0.5 then it is 1 and if less than 0.5 it is class 0.

Log Loss (Binary Cross Entropy)

Loss In Logistic Regression

Here you can see correct value is being multiplied by the predicted value. The first part of the formula explains when label is 1 multiplied by confidence of point being positive. Similarly second part stands when label is 0 multiplied by it’s confidence of being negative.

We have to minimize this loss function for this we will use gradient descent as we have in Linear Regression.

Linear Regression Blog Link

Linear Regression

Single And Multiple Dependent Variables

medium.com

Gradient Descent

We have not taken negative sign while calculating derivative in above photo and this is only difference between linear regression and logistic regression’s gradient descent formula.

Now we are all set to head to implement this →

def sigmoid(x):
    return 1.0/(1.0 + np.exp(-x))def hypothesis(X,theta):
    # X - entire array (m,n+1)
    # theta - np.array(n+1,1)
    return sigmoid(np.dot(X,theta))def error(X,y,theta):
    """
    parameters:
    X - (m,n+1)
    Y - (m,1)
    theta - (n+1,1)
    return scalar value of loss
    """
    hi = hypothesis(X,theta)
    e = -1*np.mean(y*np.log(hi)+(1-y)*np.log(1-hi))
    return edef gradient(X,y,theta):
    """
    parameters:
    X - (m,n+1)
    Y - (m,1)
    theta - (n+1,1)
    return vector
    """
    hi = hypothesis(X,theta)
    grad = np.dot(X.T,(y-hi))
    return grad/X.shape[0]def gradient_descent(X,y,lr=0.1,max_itr=500):
    n = X.shape[1]
    theta = np.zeros((n,1))
    
    error_list = []
    for i in range(max_itr):
        err = error(X,y,theta)
        error_list.append(err)
        
        grad = gradient(X,y,theta)
        
        theta = theta + lr*grad
    return theta ,error_listdef predict(X,theta):
    h = hypothesis(X,theta)
    output = np.zeros(h.shape)
    output[h>=0.5] = 1
    output = output.astype('int')
    return output

Now just apply logistic regression to any of the binary models like diabetes classification or you can import breast_cancer dataset from sklearn and then apply this!!