Cost Function in Logistic Regression

Brijesh Singh
Nucleusbox
Published in
3 min readJun 25, 2020
picture credit: Nucleusbox

This Article originally I have published on my blog you can also follow.

We have covered a good amount of time in understanding the decision boundary. Check out the previous blog Logistic Regression for Machine Learning using Python. And how to overcome this problem of the sharp curve, with probability.

In the Logistic regression model the value of classier lies between 0 to 1.

So to establish the hypothesis we also found the Sigmoid function or Logistic function.

So let’s fit the parameter θ for the logistic regression.

Likelihood Function

So let say we have datasets X with m data-points. Now the logistic regression says, that the probability of the outcome can be modeled as bellow.

Based on the probability rule. If the success event probability is P than fail event would be (1-P). That’s how the Yi indicates above.

This can be combined into a single form as bellow.

This means, what is the probability of Xi occurring for given Yi value P(x|y).

The likelihood of the entire dataset X is the product of an individual data point. This means forgiven event (coin toss) H or T. If H probability is P then T probability is (1-P).

So, the Likelihood of these two events is.

Now the principle of maximum likelihood says. we need to find the probability that maximizes the likelihood P(X|Y). Recall the odds and log-odds.

So as we can see now. After taking a log we can end up with a linear equation.

So in order to get the parameter θ of the hypothesis. We can either maximize the likelihood or minimize the cost function.

Now we can take a log from the above logistic regression likelihood equation. Which will normalize the equation into log-odds?

MLE is a Maximum likelihood estimation.

Cost Function

I would recommend first check this blog on The Intuition Behind Cost Function.

click for the full article…

Footnotes:

Gradient descent is an optimization algorithm used to find the values of the parameters. To solve for the gradient, we iterate through our data points using our new m and b values and compute the partial derivatives.

OK, that’s it, we are done now. If you have any questions or suggestions, please feel free to reach out to me. I’ll come up with more Machine Learning topics soon.

--

--

Brijesh Singh
Nucleusbox

Working at @Informatica. Master in Machine Learning & Artificial Intelligence (AI) from @LJMU. Love to work on AI research and application. (1+2+3+…~ = -1/12)