Logistic Regression Part-II
— — Dedicated to people who want to start with Machine Learning — — -
In Part-I we have discussed the equation of logistic regression in this part we shall discuss outliers' effect on logistic regression.
What is an outlier?
Well, there are many definitions for outlier but in a simple way we can say that if a point is away from all the clusters(groups) of data then it is known as an outlier point, see figure 1 to get an idea of outlier point.
Now let us see what happens if there is an outlier in the data.
Assume that it came across two different types of lines (figure 2 & figure 3).
Note: The outlier point belongs to blue points.
Type 1 line(hyperplane):-
Here as we saw in Part I the line is classifying all the points properly except outlier point.
let the distance from each data point to separating line be 1 unit and distance of outlier point from line be 35.
yi*wT.xi for each point will be as follows
yi*wT.xi for red points }→ +1(1)+1(1)+1(1)+1(1)+1(1)-1(35)
yi*wT.xi for blue points }→ +1(1)+1(1)+1(1)+1(1)+1(1)
The error made here is only 1(since only one point miss classified)
Type 2 line(hyperplane):-
let the distance from each data point to separating line be 35 unit and distance of outlier point from line be 1.
Values of yi*wT.xi will be as follows
yi*wT.xi for red points }→ -1(1)
yi*wT.xi for blue points }→ +1(35)+1(35)+1(35)+1(35)+1(35)-1(35)-1(35)-1(35)-1(35)-1(35)
The error made here is 6 (since 6 points were misclassified)
So finally even though the error made with type 2 line is 6 and error made with type 1 line is 1, logistic regression would consider type 2 line as best line that is classifying the points properly because yi*wT.xi of type 2 is higher than yi*wT.xi of type 1 line.
So the equation that we have defined in Part-I is effected with outlier points, how to get rid of this outlier points?
Here we come across a concept called squashing, it is a concept of squashing the distances, well is all distances squashed? Let’s see
In squashing if the distances are small no squashing is applied else if the distances are large squash them into small value, this function is called sigmoid function, the equation for this function is as follows
The graph for squashing looks like as shown in figure 3, if the value between
‘- inf to +inf’ is given this sigmoid function squashes them in between 0 and 1 as shown in the below graph.
We shall apply this sigmoid function our logistic regression equation i.e yi.wT.xi
Our original equation
After applying the sigmoid function
Whatever value is obtained with yi.wT.xi if the value is high it is squashed into small value so that the impact of outliers decreases. so our final equation is
Yeah, we have completed the extracting equation for logistic regression.
References:-