Chapter 2.0 : Logistic Regression with Math.

Published in

Deep Math Machine learning.ai

5 min readSep 26, 2017

In the previous story we talked about Linear Regression for solving regression problems in machine learning , This story we will talk about Logistic Regression for classification problems.

You may be wondering why the name says regression if it is a classification algorithm, well,It uses the regression inside to be the classification algorithm.

Classification : Separates the data from one to another.

This story we talk about binary classification ( 0 or 1) Here target variable is either 0 or 1

Goal is to find that green straight line (which separates the data at best)

so we use regression for drawing the line , makes sense right?

Lets take a random dataset and see how it works,

if we observe the right picture we have our independent variable (X) and dependent variable(y) so this is the graph we should consider for the classification problem

Given X or (Set of x values) we need to predict whether it’s 0 or 1 (Yes/No).

If we apply Linear regression for above data we get something like this,

Given X value 6 we can say y is 0.7 (close to 1), that’s cool but wait, What if I give negative X value or greater X value??? The output is this

We only accept the values between 0 and 1 (We don’t accept other values) to make a decision (Yes/No)

so how do we proceed further?

There is an awesome function called Sigmoid or Logistic function , we use to get the values between 0 and 1

This function squashes the value (any value ) and gives the value between 0 and 1

How??? and what is ‘e’ ???

e here is ‘exponential function’ the value is 2.71828

this is how the value is always between 0 and 1.

So far we know that we first apply the linear equation and apply Sigmoid function for the result so we get the value which is between 0 and 1.

The hypothesis for Linear regression is h(X) = θ0+θ1*X

The hypothesis for this algorithm is

Logistic function for Logistic regression.

How does it work??

First we calculate the Logit function, what the heck is that??

logit = θ0+θ1*X (hypothesis of linear regression)

2. We apply the above Sigmoid function (Logistic function) to logit.

3 we calculate the error , Cost function (Maximum log-Likelihood)

Cost function for linear regression is

here it does not work as h(x) hypothesis gives non convex function for J(θ0,θ1) so we are not guaranteed that we reach best minimum.

We take log( hypothesis) to calculate the cost function

If it does not make sense , let me make it sense to you

usually error is what?? (predicted — actual)**2 right??

so if predicted  = 1 and actual= 1
error = 0
so if predicted  = 1 and actual= 0
error = 1
so if predicted  = 0 and actual= 1
error = 1
so if predicted  = 0 and actual= 0
error = 0Note: predicted can be 0.5 and so on... also
So every time we get the error between 0 and 1 which is not useful.

just take a look at this picture and observe something..

From Left picture

If actual y =1 and predicted =0 the cost goes to infinity and If actual y =1 and predicted =1 the cost goes to minimum.

If actual y =0 and predicted =1 the cost goes to infinity and If actual y =0 and predicted =0 the cost goes to minimum.

From Right picture

if we apply log to hypothesis (predicted) we get some values (cost) which is useful to estimate the overall error.

Here is the final picture.

that’s it. based on the actual y values we calculate different functions.

4. Next step is to apply Gradient descent to change the θ values in our hypothesis ( I already covered check this link).

That’s it We are done!

we got the Logistic regression ready, we can now predict new data with the model we just built.

Predicting new data, remember?? we give new X values we get the predicted y values how does it work ??

Bam!!!!

we get the probability score(s).

So That’s it for this story , In the next story I will code this algorithm from scratch and also using Tensorflow and scikitlearn.

See ya!

Chapter 2.0 : Logistic Regression with Math.

Goal is to find that green straight line (which separates the data at best)

Written by Madhu Sanjeevi ( Mady )