Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Logistic Regression

3 min readSep 14, 2017

--

“It is more important to have beauty in one’s equations than to have them fit experiment…” — Paul Dirac

This is also know as ‘Classification’. This is used in many scenarios where we want to categories input in predefined classes. For ex. tag email as spam/non spam, predict the age group of a customer from the data of commerce portal etc.
In linear regression the output domain is a continues range, i.e. it’s a infinite set, while in logistic regression the output y we want to predict takes only a small no of discrete values. i.e. it’s Finite Set. For simplicity lets consider a binary classification where y can take only two values, 1 (positive) and 0 (negative).
Just like linear regression we need to start with a hypothesis. As the output domain is bounded (0,1) it doesn’t make sense to have a hypothesis which produces value beyond this range.

Press enter or click to view image in full size
Press enter or click to view image in full size
plot of f(x) for x belongs to (-10, 10)
Press enter or click to view image in full size

Given the above set of logistic regression models (why set? because theta is variable) we need to find the co-efficient theta for the best fit model which best explains the training set. For that we need to start with a set of probabilistic assumptions parameterised by theta and then find the theta via Maximum Likelihood.
Lets start with Bernoulli distribution , the probability distribution of a random variable which takes the value of 1 with probability p and value 0 with probability q= 1-p.

Press enter or click to view image in full size
Press enter or click to view image in full size

In linear regression we find the coefficients by equating the derivative of log likelihood to zero. We evaluated the derivative of likelihood just like we did but the resultant Ex(3) is not a mathematically closed equation that we can solve. (Remember x and theta both are vectors in the eq and h is a non linear function)
We can still find the coefficient by using a brute force algorithm called Gradient Ascent. where we start with some coefficient and then keep updating theta iteratively until the likelihood function converges.

Press enter or click to view image in full size

Example
Let’s take the wikipedia example
Suppose we wish to answer the following question:

A group of 20 students spend between 0 and 6 hours studying for an exam. How does the number of hours spent studying affect the probability that the student will pass the exam?

Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Plot of derived model for the range (0,6)

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

No responses yet