Logistic Regression

Ahmed Imam
Apr 18 · 6 min read

First

We need to remember that, what is probability?.

  • Example: flipping a coin one time. The probability of getting a head = 𝑝(ℎ𝑒𝑎𝑑)=1/2

Introduction

Logistic regression is a supervised learning technique, which is basically a probabilistic classification model. It is mainly used in predicting a binary predictor, such as checking whether credit card transaction is fraudulent or not. Logistic regression uses logistics. A logistic function is a very useful function that can take any value from a negative infinity to a positive infinity, and output values from 0 to 1 . Hence, it is interpretable as a probability. If the interpreted probability is equal to or greater than 50%, then the model predicts that the instance belongs to first class (called the positive class, labeled “1”), or else it predicts that it does not (i.e., it belongs to the negative class, labeled “0”). This makes it a binary classifier.

How does it work?

If we have a linear system:

Here, x will be the independent variable and 𝑓(𝑥) or (𝛽𝑇.𝑋) will be the dependent variable.

The logistic regression will not return the result directly but will return the logistic (probability) of that result. The logistic function 𝜎(.) also called sigmoid function orlogit function.

The output (𝑦) could be expressed as:

  • 0.5 is called `Decision Boundary’ of our model.

Plotting Sigmoid Function using python, we’ll get the following S shaped graph:

Training.

Now we know how a Logistic Regression model estimates probabilities and makes predictions. But how is it trained? The objective of training is to set the coefficients vector 𝛽

so that the model estimates high probabilities for positive instances (y = 1) and low probabilities for negative instances (y = 0). This idea is captured by the cost function J shown below:

Cost Function:

Or (for simplicity):

First let’s draw the curves represents each part of the above cost equation through the whole possible values of probability

From the above drawing we found that,

  • Cost will be go very high for the positive-class while cost decreases to zero-value for negative-class as probability moves toward zero
  • In contrast, Cost will be go very high for the negative-class while cost decreases to zero-value for positive-class as probability moves toward one
  • The minimum value of cost that will be common for both equations will be at (probability = 0.5) which represents our Decision Boundary as illustrated in the below figure:

𝛽-vector values:

  • Now we try to get the best values of 𝛽-vector that achieve the minimum cost.
  • To achieve that we need to get the cost function overall our training dataset.

Log Loss:

The cost function overall our training dataset is called log-loss and it's simply the average cost overall it and could be expressed as following for m instances:

Gradient Descent

  • With Gradient-Descent we can get the overall minimum cost and the corresponding 𝛽-vector and this is done by running the partial derivatives on the log-loss function with respect to 𝑗𝑡ℎ(𝛽𝑗) parameters in our parameter vector 𝛽.
  • The result of this equation represents the slope at any point on total cost curve as shown below:
  • The best value of 𝛽
  • that achieve min.-cost will be at the lowest slope value (near or almost at the curve's bottom and this being done on sequence of steps.
  • To achieve that with the help of gradient-descent we need to determine a learning rate 𝛼
  • Step-size to move from point-1 to point-2 is based on learning rate which helps to not overshot the curve’s bottom.
  • step = learning_rate x slope — — ->> (at point-1)
  • Update the 𝛽 value:
  • Get the new slope value at point-2 (at the new value of 𝛽).
  • Then the model repeat to get a new step and new 𝛽
  • value and so on till reach absolute value of (step) < 0.001
  • Then it get the best value of 𝛽 vector with the least cost value.

Finally

Our model is being trained and will be ready for any new dataset.

Practical Example:

Let’s use the iris dataset to illustrate Logistic Regression. This is a famous dataset that contains the sepal and petal length and width of 150 iris flowers of three different species: Iris-Setosa, Iris-Versicolor, and Iris-Virginica.

Let’s try to build a classifier to classify between the Iris-Versicolor type and Iris-Setosa type based only on the petal width feature.

Loading the data:

Train the model:

Let’s look at the model’s estimated prediction probability for flowers with petal widths varying from 0 to 3 cm

Tips

  • There are two type of predictors.
  • The 1st one is predict_proba which return probability of each output.
  • The 2nd one is a class-predictor that predicts to which class the instance belongs to.
  • Hereinafter the first one predict_proba:

By looking to the above probability-prediction graph we can notice that:

  • There is a decision boundary at about 0.75 cm where both probabilities are equal to 50%: if the petal width is higher than 0.75 cm, the classifier will predict that the flower is an Iris-Versicolor, or else it will predict that it is Iris-Setosa (even if it is not very confident).

The 2nd classifier type is the classifier-predictor that predicts to which class the instance belongs to, as in the below code:

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Ahmed Imam

Written by

Machine Learning Engineer & Python/Machine Learning Senior Instructor

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Ahmed Imam

Written by

Machine Learning Engineer & Python/Machine Learning Senior Instructor

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store