Logistic Regression for Machine Learning

Sathira Basnayake
Geek Culture
Published in
5 min readAug 24, 2021

Hi, I’m Sathira Basnayake and in this article, we will look into the basics of logistic regression for Machine learning. As we know the machine learning has 3 major categories call supervised learning, unsupervised learning and reinforcement learning. When we talk about supervised learning, classification and regression are the most used techniques. Logistic regression comes into play with classification.

To understand logistic regression properly I think it is better to have some idea about regression as well. So if you are not familiar with that concept I suggest you first get some idea about it and come back. For sake of simplicity, we will take a one attribute linear regression example. The most commonly used example would be house predicting and we will assume a data set with the area(X) and the house price(Y). In this case, price is the target variable. We may have a data set like this. As you have an idea about linear regression you would know how to find the most suitable line for prediction.

The function of the line would be something like “wx + b = y_hat”. You know to find the best w and b values using gradient descent or any other method. We use MSE(Mean Square Error) as our loss function in that scenario. But what if the target variable is a variable with class values.

In such a scenario we need to use Logistic Regression. It’s basically a classification algorithm that can be used to perform binary classification using independent variables. So if you know linear regression you may think why we can’t use linear regression as it is and draw the line and after that divide the outcomes according to a threshold. Actually, It can be done but not very effective. So what we do is take the same “wx + b = y_hat” kind formula and feed it into the Sigmoid Function.

This is what we call the sigmoid function. We have to get the predicted value using “wx + b = y_hat” like function and feed that to the sigmoid function. By doing that we can always get a value between 0 and 1. So by using a threshold like 0.5 we can predict that values above 0.5 belong to class A and values below 0.5 will belong to class B. (We use 0.5 most of the time. But we have to move the threshold in certain cases).

Now the next question is how do we optimize the parameters. What do we have to use as the cost function? Actually, we cannot use MSE as our cost function because the cost function is not convex. So if we try to use MSE on logistic regression we would probably trap in a local minimum.

So we have to find a convex cost function.

This is the cost function which we use in logistic regression. This function is convex and if you are interested you can observe more about the mathematical side of this subject. This is another form of the same cost function.

(Andrew ng’s course)

As we know the cost function, the next step is to find the best w and b values that minimize the cost function. To do that we use the gradient descent algorithm. In this article, we are not going to discuss gradient descent in detail. But basically what it does is finds the minimum of a differentiable function through an iterative optimization approach. In the beginning, we choose initial guess values to w and b and after that, we minimize the error with every iteration where we can finally get the optimum w and b values. The gradient descent algorithm is shown below.

(Andrew ng’s course)

In the above image, we can take theta0 as b and theta1 as w in our scenario. alpha is called the learning rate which is a hyperparameter that determines the speed. In this algorithm, in every iteration, some portion of the derivative of the cost function is reduced by the current value. As it always points to the steepest descent direction b every iteration the cost function will direct toward its global minima. Finally, at the end of the iterations we get tuned parameters. That is s a very basic introduction to gradient descent.

So after tuning the parameters we got our model. Now we can use the model to predict new values according to our needs. In the practical scenario, there are much more things to consider apart from these basic core concepts.

This article gives you a basic idea about the theory behind logistic regression for machine learning. This does not cover any practical aspects of the subject. Now we have reached the end of our article. Your comments and feedback are warmly welcome. I highly encourage you to show me if there are any mistakes with my content. Thank You all. Have a nice day.

--

--

Sathira Basnayake
Geek Culture

I’m a third year undergraduate student in Faculty of Engineering University of Peradeniya. My specialization field is Computer Engineering.