Fundamentals of Logistic Regression

Om Rastogi
Analytics Vidhya
Published in
4 min readMay 12, 2020

Here we’ll learn binary classification using logistic regression

This is second part of series on Logistic Regression:

  1. Fundamental of Logistic Regression
  2. Coding Logistic Regression in Python From Scratch

The logistic regression is a supervised learning system borrowed from the concepts of statistics. The name “Logistic” is taken from the Logistic Function also called the sigmoid function.

sigmoid function

Clearly this function ranges (0 to 1), the fact itself gives an intuition of its application in binary classification.

Hypothesis

In machine learning, a hypothesis is an equation which is used to predict the output taken the given input. The hypothesis equation, gives the probability for Y=1. It is not supposed to be accurate, until we apply learning algorithms to improve the hypothesis.

The hypothesis equation

Where,
W → weight matrix
b → coefficient matrix
X →input matrix

Decision Boundary

Z is than put in sigmoid

Z is the decision boundary of the hypothesis.

This figure shows how the regions are divided by the decision boundary

Predict y = 1, for Z ≥ 0
And y = 0, for Z <1

Cost Function

Cost Function is a function that measures the performance of a Machine Learning model for given data. Cost Function quantifies the error between predicted values and expected values and presents it in the form of a single real number.

Where,
Y → Output
H → Hypothesis Output
m → Number of Samples

Cost Function score is more like a performance report of the model. Which needs to be improve by optimization.

Gradient Descent Optimization Algorithm

The above graph is cost vs weight curve. As we move towards the ideal weight, the cost will obviously decrease and as we exceed the ideal weight, the cost will actually increase. Surely we want our hypothesis to reach the minima.
The idea behind the gradient descent algorithm, is to change the weights to proportional to the gradient. If its more steep, the change will be significant, and for gradual curve, change will be gradual.

The Equations gradient descent is:

Update of the weights and coeff

The values of these gradient can be found out by calculating some derivatives.

Values for gradients

Where Alpha → learning rate.

This procedure has to be repeated for number of iteration, as the descent takes place in small steps.
The learning rate value is also important for optimizing the model learning time and accuracy. Set the learning rate too low, it will slow down the training, set the learning rate too high, it won’t reach the minima at all.

Prediction

Once the model is made and hypothesis is optimized, we need to predict the outputs for test sample.
We know that hypothesis gives the prediction of Y = 1. Hence it is safe to say that:
H≥0.5 → P=1
H<0.5 → P=0

Hope you enjoyed and learned the concepts of Logistic Regression. I’d like to express my gratitude to Andrew Ng, from where I learnt these concept at the first place.

If you have stayed till the end — Do Clap. It’ll keep me motivated to write more. Thank You.

--

--

Om Rastogi
Analytics Vidhya

I believe in an altruistic world, where creativity and imagination replace repetitive work