My Machine Learning Diary: Day 15
This is day 15 of my machine learning diary (MLD) series.
Today I learned cost function for logistic regression.
Cost Function
Unlike linear regression, we can simply use square error as our cost function because our hypothesis has changed. If we use square error with logistic function, it will produce a non-convex graph, and we won’t be guaranteed to find absolute minima as there would be many local minima out there.
Likelihood Function (a.k.a Log Loss Function)
Likelihood function works really nice for measuring the performance of probability function (a value between 0 and 1). We will use it as our cost function because our hypothesis produces the probability of y = 1. Using likelihood function, our cost function is defined as follow:
This graph makes sense. As the hypothesis reaches 1, the cost gets smaller and smaller, which is good because y = 1. As the hypothesis gets closer to 0, the cost appraches infinity.
This graph makes sense because it performs in the opposite way as when y = 1.
Cost Function in One Line
For a particular sample, the cost function can be rewritten as follow:
In general,
Gradient Descent
To perform gradient descent, we need to take the derivative of the cost function we got above. It was computationally heavy, so I will not dig deeper to see how to do it today. The result is very similar to that of linear regression.
In fact it is the exactly same notation. However, we must note the hypothesis is now different. So the cost function actually looks like this:
It was pretty hard, and I hope I can develop deeper understanding in the future.