Logistic Regression Part-I
—Dedicated to people who want to start with Machine Learning —
There are many ways for understanding Logistic Regression. In this article, we shall learn with a graphical(Geometrical intuition) way of representing Logistic Regression. Even though the name itself speaks about the regression problem, logistic regression mainly used for binary class classification problems. Yes, Logistic Regression is a simple classification technique. In simple words, we can say that it finds the best line that separates two classes.
If you keenly observe figure 1, the points in figure 1 belong to two classes i.e blue class and red class. Now as a human we can draw a line that properly separates two classes as shown in figure 3 but a machine can’t do that, so Logistic Regression helps the machine to learn the best line that can separate these two classes.
There are many possibilities of getting a line as shown in figure 2, what is the best line among those possibilities? The best line is the line that classifies point with minimum possible errors that means whatever the best line you find, there will be the minimum error. So what is the best line that separates(classifies) these points with minimum error? The line that is centered in figure 2. Ones have a look at figure 3, there is a line that is separating both the classes.
Equation of the line:- w1x1 +w2 x2 +w0 = 0
Line in 3D becomes a plane
Equation of the plane:- w1 x1 +w2 x2 +w3 x3 +w0 =0
If it is assumed that line is passing through an origin then intercept will be zero
i.e equation of the plane is w1 x1 +w2 x2 +w3 x3 = 0, in shortcut we can denote with wTx = 0, where w and x are vectors.
When this line concept is extended to n dimensions it is known as hyperplane, each of these planes will be having a normal that is perpendicular to the plane as shown in figure 4
Note:-
1. When a point in the direction of w (as shown in figure 4) is multiplied with w then we get a positive value.
2. When a point in the opposite direction of w is multiplied with w then we get a negative value.
i.e w * red point = positive value, w * blue point = Negative value
Now as we know that if the product of w and x is positive that point considered as point belonging to red points and if the product of w and x is negative that pint considered as point belonging to blue points.
How can we say that this line is classifying the points correctly?
Because the blue (miss classified) point is in the direction of w and if it is multiplied with w we get a positive value by which we assume that this point might belong to the group of red points. Is there any trick to find out this mistake made by this line? Yes, there is, consider a variable yi which is +1 for all red points and -1 for all blue points as shown below.
Now to know whether this line is classifying the points correctly or not, we can multiply this wTx with yi i.e yi wTx.
How does this yi wTx can tell us mistakes made by wTx?
Let’s see an example
- Let’s consider a red point, wTx will be a positive value, and yi will also be a positive value, then yi wTx will be a positive value.
- Let’s consider a blue point, wTx will be a negative value, and yi will also be a negative value, then yi wTx will be a positive value because multiplication of two negative values will be positive
- Let’s consider a red point that is classified as blue point, wTx will be a negative value(since it is classified as blue point we get negative value right?), and yi will be a positive value, because originally it is a red point then yi wTx will be a negative value, because product of positive and negative will be negative value
- Let’s consider a blue point that is classified as red point, wTx will be a positive value(since it is classified as the red point we get positive value right?), and yi will be a negative value because originally it is a blue point. then yi wTx will be a negative value because the product of positive and negative will be a negative value.
So if a point is correctly classified then yi wTx >0 and if a point is miss classified then yi wTx<0
So we must find a line such that the sum of yi wTx for all the points is maximum
So far we obtained an equation to find out the best weights of the line that separates the points with a minimum error.
Are we done with the above equation?
Let us test the above equation with a simple example, assume that we have passed 10 data points as shown in figure 5 to the above equation and for example, assume that it came across two different types of lines (figure6 & figure 7).
Type 1 line(hyperplane):-
Lets us assume that there are 10 points 5 in each class (group) as shown in figure 5 and there is a line separating these 10 points into red and blue classes.
Here let the distance from each data point to separating line be 1 unit. That means the distance from any red point to the line will be 1 and the distance from any blue point to the line will be 1
Since all points are correctly classified and lie at a distance of 1 unit, yi*wT.xi for each point will be as follows
yi*wT.xi for red points }→ +1(1)+1(1)+1(1)+1(1)+1(1)
yi*wT.xi for blue points }→ +1(1)+1(1)+1(1)+1(1)+1(1)
Type 2 line(hyperplane):-
Consider line that is separating blue and red points which are having normal ‘w’ as shown in figure 7.
Here let the distance from each data point to separating line be 1 unit.
As you see all points are classified properly, but only one blue point is miss classified(blue point).
For each red and blue point yi*wT.xi will be as follows
yi*wT.xi for red points }→ +1(1)+1(1)+1(1)+1(1)+1(1)-1(1)
yi*wT.xi for blue points }→ +1(1)+1(1)+1(1)+1(1)
So finally the logistic regression selects type 1 line to separate the points because error with type 1 line will be minimum(In this case no error) when compared with error with type 2 line.
Yes, we are done with the equation 🤔 but wait……….. What about outliers?
Outliers effect on yi.wT.xi is explained in Logistic Regression Part-II
References:-