Essentials: Logistic Regression (Pt 1: Understanding)

Jason Leong
Nov 8 · 6 min read
Photo by Jen Theodore on Unsplash

Welcome to my first episode on Logistic Regression. Thanks for keeping me company on this journey.

Methodology behind Logistic Regression

y = c + m₀x₀ + m₁x₁ + m₂x₂ + ...

The above equation should be like bread and butter — for an aspiring data scientist like yourself. You are right, it is a linear regression model. But what has this got to do with this current post — which is on Logistic Regression?

Logistic Regression is considered as a Generalized Linear Model (GLM). GLMs are broad classes of models that include linear regression, logistic regression, log linear regression, Poisson regression, etc. BUT, they should never be considered as a linear regression model, because a model is only considered a linear model if the mean of the response is a linear function of the parameter, and this is clearly violated for logistic regression.

Back to the question, the relation here is that, Logistic Regression also formulates an equation quite similar to that above.

logit(p) = c + m₀x₀ + m₁x₁ + m₂x₂ + ...
*where logit(p) = log(p/(1-p))
= log(Odds Ratio)

So, you may ask, why is log(odds) used?

- Probability ranges from 0 to 1

- Odds ranges from 0 to ∞

- Log Odds range from -∞ to ∞

That is why the log odds are used to avoid modeling a variable with a restricted range such as probability.

What we are ultimately identifying in the model are these few key parameters:

  1. The coefficients (c, m₀, m₁, m₂, …) — these values will be instantiated randomly/fixed values initially, and will be constantly updated to obtain an optimal set of values (how? why? this will be covered later, stay with me)
  • These coefficients can either be positive, or negative

2. The variables (x₀, x₁, x₂, …) — which are determined by YOU (what features to keep from the dataset, NOT simply dumping them!)

  • These variables can either be discrete or continuous

3. y — known as the independent variable. This should preferably be binary (i.e 0, or 1) because this allows a statistical method (odd ratio between TWO variables) to be calculated.

Odd Ratio (OR) between 2 variables is a measure of A’s effect in the presence of B, against A’s effect in the absence of B.

Step 1: Calculate the odds that a member of the population has property “A”. Assume the person already has “B.”
Step 2: Calculate the odds that a member of the population has property “A”. Assume the person
does not have “B.”
Step 3: Divide step 1 by step 2 to get the odds ratio (OR)

  • The value of y will be calculated as a sum of all the optimal value of the coefficients, and variables. The value of y can either be positive or negative continuous variable.
  • However, do recall that the value of y (i.e our independent variable) should be a binary value (i.e 0, or 1), but as mentioned previously the value of y is continuous. Therefore, we have to map the value of y to something interpretable. So we employ the use of the Sigmoid function.

Sigmoid Function

Simple looking function, but does the trick. Note that the curve cuts the y-axis at 0.5. To interpret this, here are some pointers:

  1. We will now view the previously calculated y value in the perspective of the x-axis. The y value can be any value from -∞ to ∞, similar to the x-axis which spans from -∞ to ∞.
  2. Negative x-axis (i.e negative y value) gives a Sigmoid function value <0.5, whereas positive x-axis (i.e positive y value) gives a Sigmoid function value >0.5. The value of 0.5 is awarded if y value of 0.
  3. The exact Sigmoid value can be calculated from the equation given in the figure.

After you obtained your mapped Sigmoid function value from your initial y value, consider an appropriate threshold. A common threshold is taken to be 0.5. That implies if the Sigmoid function value is greater than or equal to 0.5, then it considered to be of Class 1 (this is similar to rounding up). Vice-versa, if it falls below 0.5 then it is considered to be of Class 0.

Setting your own threshold

However, one’s use-case may not always be appropriate to set the threshold to be 0.5. For example, if Class 1 is labelled as one contracting ‘Disease A’, and otherwise if it is labelled Class 0, you might want to be on the safe side, and consider instances to be Class 1 even if probability is below 0.5. This minimizes the count of False Negatives (that is, if the cost of False Negatives are high, which is true in this context).

False Negatives: Predicting a person DID NOT contract a disease, when in fact the person has


Cost Function

In all Machine Learning models/context, the general concept is fairly similar. That is, we are always trying to optimize the parameters/coefficients such that the cost function is the smallest. The cost function can be understood as the cost of mis-classification (in a classification problem statement), or cost of fit (regression-related, more commonly known as sum of squared error).

For binary classification problems, our objective would to be minimizing the cost functions to obtain an optimal set of parameters for our decision boundary.

I am not going to dive into the topic of cost functions here, those are widely available on other sources. But I do wish to touch on the concept of Odds Ratio a little bit deeper.


Odds Ratio (OR)

Now, say you have optimized a set of parameters for the decision boundary, consider these 3 scenarios. Our problem statement is predicting if the person has contracted ‘Disease A’.

1. Logistic Regression with No Predictor Variables

logit(p) = c

Assuming the optimized parameter is c = -1.12546.

  • The intercept= -1.12546 which corresponds to the log odds of the probability of contracting the disease
  • We can go from the log odds to the odds by exponentiating the coefficient which gives us the Odds = 0.3245.
  • We can go backwards to the probability by calculating
p = Odds/(1+Odds) = 0.245

That is, there is a probability of 24.5% of contracting ‘Disease A’.

2. Logistic Regression with a Single Dichotomous Predictor Variable

logit(p) = c + m₀*gender
*where gender = 1 if female, 0 otherwise

Assuming the optimized parameter is c = -1.4708517, and m₀ = 0.5927822.

  • The intercept= -1.47085 corresponds to the log odds for males contracting the disease (since male is the reference group, as gender=0).
  • The coefficient for female= 0.59278 which corresponds to the log of odds ratio between the female group and male group. The odds ratio equals 1.81 which means the odds for females are about 81% more likely to contract the disease as compared to males.

3. Logistic Regression with a Single Continuous Predictor Variable

logit(p) = c + m₀*(no. hours worked)
*where hours is a continuous variable

Assuming the optimized parameter is c = 9.7939421, and m₀ = 0.1563404.

  • The intercept= -9.79394 which is interpreted as the log odds of a person with number of hours worked of zero being contracted the disease.
  • The coefficient for hours worked = 0.15634 which is interpreted as the expected change in log odds for a one-unit increase in the hours worked. The odds ratio can be calculated by exponentiating this value to get 1.16922 which means we expect to see about 17% increase in the odds of being contracting the disease, for a one-unit increase in hour worked.

Useful huh? That’s it for now! Stay tuned to Part 2, where I will dive into using Logistic Regression on actual use cases.

Thank you for your time, and if you have any comments, do post them, I will read every single one of them. Have a great day ahead!

Jason


Essentials: Data Science & Analytics

I’m an undergraduate taking up Data Science & Analytics in NUS. I post my genuine thoughts and progressions in this journey of Data Evolution. It cannot get more authentic than this.

Jason Leong

Written by

Make the works work for you

Essentials: Data Science & Analytics

I’m an undergraduate taking up Data Science & Analytics in NUS. I post my genuine thoughts and progressions in this journey of Data Evolution. It cannot get more authentic than this.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade