Understanding Logistic Regression using Log Odds

Shuangyuan (Sharon) Wei
May 2 ยท 3 min read
Photo by Shuangyuan Wei on Unsplash

I had to adjust my thinking when it comes to logistic regression because it models a probability rather than a mean and it involves the non-linear transformation. In this article, I will explain the log odds interpretation of logistical regression in math, and also run a simple logistical regression model with real data.

The odds of an event are the probability of an event that it happens over the probability that it doesnโ€™t. For example, if the P (success) = 0.8, and P (failure) = 0.2, the odds of success will be 0.8/0.2=4.

We use logistic regression to model a binary outcome variable (y is either 0 or 1). Similar to a Bernoulli random variable, we want to consider the ๐‘ƒ (๐‘ฆ=1|๐‘ฅ) and ๐‘ƒ (๐‘ฆ=0|๐‘ฅ). Also, we know that the general linear model specification: ๐ธ(๐‘ฆ|๐‘ฅ)=๐‘“(๐‘ฅโ€ฒ๐›ฝ), we can derive the conditional mean in the case of linear regression to be:

๐ธ(๐‘ฆ|๐‘ฅ)=๐‘ƒ (๐‘ฆ=1|๐‘ฅ)ร—1+๐‘ƒ (๐‘ฆ=0|๐‘ฅ)ร—0=๐‘ƒ (๐‘ฆ=1|๐‘ฅ)

Therefore, the expectation we are modeling is a probability: ๐‘ƒ (๐‘ฆ=1|๐‘ฅ). However, if we model it with a linear combination of the independent variable and parameters:๐‘ƒ(๐‘ฆ=1|๐‘ฅ)=๐‘ฅโ€ฒ๐›ฝ, it does not work because probability should be bounded between 0 and 1.

Therefore, we choose a link function ๐‘“(๐‘ฅโ€ฒ๐›ฝ) to give values between zero and one. And for logistic regression, the link function we use is a logit link function:

Because the underlying model is a probability, we use maximum likelihood estimation for logistic regression. To construct a likelihood function, it is the same form as the Bernoulli random variable.

Even though logistic regression is mainly used for classification and prediction in machine learning, for the sake of completing this article about using the log odds to interpret logistic regression, I ran a simple logistic regression in Python to get a sense of what the results look like. I used a dataset that contains 4,000+ emails. It includes 57 variables which are features indicating indicators whether an email is spam or not, for example, word_free variable indicates the email contains the keyword โ€œfreeโ€. The spam variable is a binary variable showing each email has been tagged as spam or not. I ran the simple code below and printed out the results:

The coefficient of world_free is 1.55. Thus, the odds that an email is spam increase almost exp(1.55) ~= 5 times if that email contains the word free. Itโ€™s worth noting that the summary table above also nicely provides p-value and confidence interval (95%). I would like to dig deeper into how Pythonโ€™s statsmodels library computes the standard error and p-values for my curiosity. Perhaps I will write another article about it later :).

Lastly, I will quickly go over the key assumptions of logistic regression as I did for the OLS linear regression in this article. It is needless to say that knowing the key assumptions underlying the method is important. Also, โ€œassumptions about logistic and linear regressionโ€ has been one of the top questions in the data scientist interview.

  • The outcome is a binary or dichotomous variable

reference: Taddy, Matt. Business Data Science: Combining Machine Learning and Economics to Optimize, Automate, and Accelerate Business Decisions, August 21, 2019.

Geek Culture

Proud to geek out. Follow to join our +500K monthly readers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and youโ€™ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer โ€” welcome home. Itโ€™s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store