How to Interpret the Logistic Regression model — with Python

Published in

Analytics Vidhya

7 min readJun 9, 2021

Logistic regression model is one of the efficient and pervasive classification methods for the data science.

Many business problems require automating decisions. For example, what is the churn likelihood for a given customer? What is the likelihood of a click on an ad for a given customer? and so many alike. These are categorised as classification problems which itself is a part of larger topic called supervised learning. Most of the classification problems have an outcome which takes only two different values. These sort of classification problems are known as binary classification. Some examples of binary outcomes are phishing/not-phishing, click/don’t click, churn/don’t churn. Even in the case of more than two outcomes, the problem can often be recast into a series of binary problems using conditional probabilities.

Logistic Regression

The logistic regression is a little bit misnomer. As its name includes regression it does not actually deal with regression problem. Logistic regression is one of the most efficient classification methods. Due to the high similarity with linear regression it is easy to interpret and hence one of the best candidates for the data exploratory (profiling) and prediction. Although there are handful similarities between linear regression and logistic regression, there are some differences.

Similarity between Linear Regression and Logistic Regression

The highest similarity between logistic and linear regression is that both try to linearly approximate a specific function. In linear regression, we estimate the true value of the response/target outcome while in logistic regression, we approximate the odds ratio via a linear function of predictors. The odds ratio is the ratio of the probability of success and failure. In classification, mostly the success is labelled as “1” (the interest case) and failure is labelled “0” in binary classification problems.

The most distinctive difference between logistic and linear regression is the object function and the assumption underlying the data. In linear regression the object function upon which the weights are derived is sum of squares of errors and it is assumed that the conditional distribution of the target response is normal. In contrast, in logistic regression, deviance function is used for weight derivation. It also assumes the underlying conditional distribution is Binomial.

Linear Regression V.S. Logistic Regression

Furthermore, the nature and analysis of the residuals from both models are different. The Partial residuals in logistic regression, while less valuable than in regression, are still useful to confirm nonlinear behaviour and identify highly influential records.

Mathematics behind Logistic Regression

For 0–1 response, we need to model

In other words, the response variable is modelled as a Bernoulli variable that its parameters depends on the covariates X_i. One naive thought is to approximate the mentioned probability through a linear function. However, there is a caveat on this. The modelled function does not guarantee that the probability will lie between 0 and 1. In order to tackle this we need to convert the probability and approximate the resultant via a linear regression. For this end, the transform adopted is the logit transform.

The rationale behind adopting the logit transform is that it maps the wide range of values into the bounded 0 and 1.

The logit is interpreted as “log odds” that the response Y=1. The logit function is shown in Figure below. For probability in the range of 0.2 and 0.8 fitted values are close to those from linear regression.

The black dots in the figure above reflect the true response values which are mapped to 1 and 0.

The logistic regression is modelled as

We can use any form of the generalised linear model (GLM) to approximate the logit odd ratio. Logistic regression is a special instance of a GLM developed to extend the linear regression to other settings.

The optimisation approach for fitting the model is based on the deviance as mentioned before and in contrast to the linear counterpart, it does not have a closed-form. Many iterative algorithm can be used to derive the maximum likelihood solution of the logistic regression parameters. Unfortunately, delving into these algorithms is out of this article’s scope.

Python Implementation

In order to demonstrate the practicality of the logistic regression, we aim at implementing the logistic regression using the Sci-kit Learn. We adopt the Titanic dataset for logistic regression. Additionally, 4 more columns have been added, re-engineered from the Name column to Title1 to Title4 signifying males & females depending on whether they were married or not .(Mr , Mrs ,Master,Miss). An additional analysis to see if Married or in other words people with social responsibilities had more survival instincts/or not & is the trend similar for both genders. The full dataset can be downloaded from here. The dataset consists of 15 predictors such as sex, fares, p_class, family_size, … . The target response is survived. Please note that the factor variables which take a limited level of values have been already converted via one-hot encoding. In order to avoid multicollinearity effect of one-hot encoder for factor variables, we omit one of the levels of the factor variable for each set of factor variables. The predictors for our The LogisticRegression from sklearn.linaer_model will provide the logistic regression core implementation. The code for implementing the logistic regression (full code) is as follows:

from sklearn.linear_model import LogisticRegressionpredictors = ['Sex', 'Age', 'Fare', 'Pclass_1',
 'Pclass_2', 'Family_size', 'Title_1', 'Title_2', 'Title_3', 'Emb_1', 'Emb_2']outcome = 'Survived'
X = train_df[predictors]
y = train_df[outcome]logit_reg = LogisticRegression(penalty='l2', C=1e42, solver='liblinear')
logit_reg.fit(X, y)print('intercept ', logit_reg.intercept_[0])
print('classes', logit_reg.classes_)
pd.DataFrame({'coeff': logit_reg.coef_[0]}, 
             index=X.columns)

The output will be:

Interpreting the Model

The intercept and coefficients of the predictors are given in table above. Please note that in interpreting the coefficient the reference level should be taken into account. And also please remember that the linear equation is about approximating the logit of odds ratio and not the probability. The appropriate conversion should be taken if probability-based interpretation is needed.

As an example, we can see that the sex’s coefficient is -3.55. Since the sex for male is 1 and 0 for female. Being a male reduces the surviving odds ratio to about 3% (exp(-3.55)=0.028) of the case where the sex is female! If we convert it in terms of probability, the probability is almost 0.03 of probability of drowning.

Regarding the other factor variable, the reference level should be considered. For example, since the Title_4 is omitted from the predictors, the Title_1 coefficient should be interpreted, accordingly. For our case, this value is 0.48 which means that being the Title_1 = Mr (please refer to Kaggle page for explanation of data) the odds ratio of survived increases around 60% (exp(0.48)=1.6) of the case where the Title_4=Miss. Perhaps the married male is in high priority for saving ;)

using the confusion_matrix and precision_recall_fscore_support from sklearn.metrics we can obtain the confusion matrix.

from sklearn.metrics import confusion_matrix, precision_recall_fscore_support
pred = logit_reg.predict(X)
conf_mat = confusion_matrix(y, logit_reg.predict(X))
print('Precision', conf_mat[0, 0] / sum(conf_mat[:, 0]))
print('Recall', conf_mat[0, 0] / sum(conf_mat[0, :]))
print('Specificity', conf_mat[1, 1] / sum(conf_mat[1, :]))
precision_recall_fscore_support(list(y.values), list(logit_reg.predict(X)))

You can see that there is a trade-off between recall and specificity. Capturing more 1s generally means misclassifying more 0s as 1s. The ideal classifier would do an excellent job of classifying the 1s, without misclassifying more 0s as 1s. The metric that captures this trade-off is the “Receiver Operating Characteristics” curve, usually referred to as the ROC curve. The ROC curve plots recall (sensitivity) on the y-axis against specificity on the x-axis.4 The ROC curve shows the trade-off between recall and specificity as you change the cutoff to determine how to classify a record [1].

Wrap-up

In this article, we briefly introduce the logistic regression classifier and share the similarity and differences between logistic and linear reression. We have also demonstrated the classifier using the Python language. We also interpret the model based on the coefficients and derive the model assessment.

Reference

[1] Bruce, Peter, Andrew Bruce, and Peter Gedeck. Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python. O’Reilly Media, 2020.