Understanding Logistic Regression using Log Odds

Shuangyuan (Sharon) Wei
Geek Culture
Published in
3 min readMay 2, 2021

--

Photo by Shuangyuan Wei on Unsplash

I had to adjust my thinking when it comes to logistic regression because it models a probability rather than a mean and it involves the non-linear transformation. In this article, I will explain the log odds interpretation of logistical regression in math, and also run a simple logistical regression model with real data.

The odds of an event are the probability of an event that it happens over the probability that it doesn’t. For example, if the P (success) = 0.8, and P (failure) = 0.2, the odds of success will be 0.8/0.2=4.

We use logistic regression to model a binary outcome variable (y is either 0 or 1). Similar to a Bernoulli random variable, we want to consider the 𝑃 (𝑦=1|𝑥) and 𝑃 (𝑦=0|𝑥). Also, we know that the general linear model specification: 𝐸(𝑦|𝑥)=𝑓(𝑥′𝛽), we can derive the conditional mean in the case of linear regression to be:

𝐸(𝑦|𝑥)=𝑃 (𝑦=1|𝑥)×1+𝑃 (𝑦=0|𝑥)×0=𝑃 (𝑦=1|𝑥)

Therefore, the expectation we are modeling is a probability: 𝑃 (𝑦=1|𝑥). However, if we model it with a linear combination of the independent variable and parameters:𝑃(𝑦=1|𝑥)=𝑥′𝛽, it does not work because probability should be bounded between 0 and 1.

--

--

Shuangyuan (Sharon) Wei
Geek Culture

Amazon data scientist, PhD candidate in Quant Marketing.