Introduction to Logistic Regression and its Mathematical Implementation

RADIO SAYS Arpit pathak
ML_with_Arpit_Pathak
7 min readJun 10, 2020

Hello readers , this blog covers one of the most important algorithm of machine learning called Logistic Regression , the need or requirement and its mathematical implementation . The mathematical implementation also covers how the linear regression is converted into logistic regression .

Logistic Regression

Logistic Regression is an algorithm used in machine learning that works on the principle of binary classification by using the concept of regression analyses . This algorithm is used in machine learning to predict the happening of certain class or event based on its probability .

Let us consider an example of a result of students where they are classified into two classes ; Pass(P) or Fail(F) ; based on the marks they got in the exams as shown in the diagram .

Now , if we consider Pass as 1 and Fail as 0 and plot a graph of this data , let us see what we get .

The graph of our data will look like this as shown in the figure .

Now , let us try to use the Linear regression to create the best fit line on this graph using the equation — ( y = b + mx )

The best fit line “A” is the line we got from out above data . Now let us assume that two more new points (green color points) get added to our data . Now , if we will calculate the best fit line , it will change to line “B” . Let us consider the problem here .

Now , if we consider case A and analyze the graph then —

Pass ≥ 50 > Fail or we can say..

Pass = Result ≥ 0.45

But when case B occurs and we predict the 55 marks student , then the prediction line would consider it to be Fail which proves our prediction of the model wrong . Hence , we can say that our best fit linear regression line is not robust to the future data points in giving the predictions .

One more problem is that if the marks of the student are 100 , then the prediction line will give the output more than 1 and if the marks are somewhat 15 , then this prediction line will give a value less than 0 i.e negative . So how can we classify that output as our pass is 1 and fail is 0 ???

Here comes the role of the Logistic Regression Analysis which tweaks the regression line in such a way so that the output always remains between 0 and 1 as follows —

This line represents the sigmoid function that converts the linear regression best fit line into this kind of line that fits all the data points within the 0 and 1 .

The output of this line of prediction only gives the value between 0 and 1 which , in reality is the probability of an input to be considered in a class .

Now , considering that a student got 45 marks in the exam , the result predicted would be 0.5 which is greater than 0.45 and hence the student would be given the result of “Pass” .

So , this is what the logistic regression do to make predictions . It finds out the probability of an event to fall in both class and then categorize the event based on the highest probability of either class it gets . Let us now try to have a mathematical understanding of this algorithm .

Mathematical Implementation of Logistic Regression

Let us now understand the mathematical implementation of the classification in the regression through Logistic Regression .

Let us consider the following graphical representation of a number of data points in any space . Let us assume the “green” points as “+1” and the “red” points as “-1” . These values are the “y” or output values for these points .

Now according to linear regression , we have to create a best fit line which linearly separates all these points . The equation of this best fit line will be “y = mx + b” where “m” is the slope of the line and “b” is the y-intercept of line . Since our line passes through the origin (0,0) , hence b = 0 . So now our equation becomes “y = mx” .

Now , since the points are in some space , then to linearly separate them , we draw a plane instead of line whose equation is as given in the diagram above . The linear regression say that the best fit plane should be plotted in such a way so that the sum of distance of all the points from the plane should be minimum . Let us see how this distance can be calculated .

So , according to the linear algebra , if the distance of any point from the plane is to be calculated then the formula is as given on the left side . In our case , b = 0 and ||w|| = 1 (let unit vector) , then our equation becomes as given .

Now , if the point lies above the plane , then the distance is always +ve and if the point lies below the plane then the distance is -ve .

Now , from the above diagram , if we consider the point “A” as green point i.e y = +1 and the point “B” as red point i.e y = -1 ; then by multiplying the distance value with the y value can easily classify the two points as follows —

Green point A

y = +1 , distance = +ve >>>>>>> ( y* distance ) = +ve value

Red Point B

y = -1 , distance = -ve >>>>>>>> ( y* distance ) = +ve value

So , this above equation will show that the points are correctly classified . Now let us see the diagram below of some wrong classified points .

Here , if we will calculate the y*distance for point A :

y = -1 , distance = +ve

y*distance = -ve value

and if we do the same for point B :

y = +1 , distance = -ve

y*distance = -ve value .

Hence , this formula easily tells that which points are correctly classfied by the plane and which are not . So , the cost function can be written as —

where “i = 1 to n” is the number of data points , “y” value is their actual output and the “wx” is their distance from the plane . The formula (y * wx) gives the validation predicted value(wx) is correct or not . In order to have a better fit plane , the summation of all these should be maximum i.e max(J) .

Now , to convert this all Linear regression cost function to the binary classification i.e to convert the “J” values into binary category between “0” and “1” , we use the Sigmoid Funtion which is called as follows —

This can also be explained with the metric of classification called Confusion Matrix .

Here ,

TP (True Positive) : The prediction (1 = +ve = wx) is correct(True) according to real value (1 = +1 = y) .

TN (True Negative) : The prediction (0 = -ve = wx) is correct(True) according to real value(0 = -1 = y) .

FP(False Positive) : The prediction (0 = -ve = wx) is wrong(False) according to real value (1 = +1 = y)

FN(False Negative) : The prediction (1 = +ve = wx) is wrong(False) according to real value (1 = -1 = y) .

The best model of Logistic Regression should have maximum correctly classified (TP + TN) values and minimum wrongly classified (FP + FN) values among all the outputs . The accuracy can be measured as follows —

That is all about this blog on the mathematical implementation of Logistic Regression . Hope it was an informative one . Thank you for reading .

--

--