Logistic Regression with Practical Implementation

Amir Ali
The Art of Data Scicne
8 min readJul 27, 2018

In this chapter, we will discuss the Logistic Regression Algorithm which is used for classification and problem and its supervised machine learning algorithm.

This chapter spans 3 parts:

  1. What is Logistic Regression?
  2. How does the Logistic Regression Algorithm work?
  3. Practical Implementation of Logistic Regression in Scikit Learn.

1. What is Logistic Regression?

Logistic Regression is a Statistical method for analyzing a dataset in which there are one or more independent variables that determine on outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcome).

Logistic Regression is a regression model where the dependent variable is categorical.

Categorical Variables that can only have fixed values such as A, B or C or Yes or No.

Dependent y=f(x) i.e y is dependent on X. It means when X changes its effects on y according to X.

Therefore whenever the outcome of the dependent variable (y) is categorical, like 0 or 1, yes or No use the logistic regression.

Example:

Suppose that we are interested in the factors

That influence whether a political candidate wins an election. The outcome (response) variable is binary which means that 0 or 1, win or lose using the sigmoid function.

The predictor variable of interest is the amount of money spent on the campaign, the amount of time spent campaigning negatively and whether or not the candidate is an incumbent.

2. How does the Logistic Regression works?

Logistic Regression basks independent and dependent variables. Our dependent variable is categorical in the form of Discrete outcome (we win the match today or not.) As shown in the diagram

But the question is how trained our model.

So let’s train our model

2.1 Logistic Function

The logistic function is a function which is the sigmoid function

It’s an S-shaped curve that can take any real-valued number and map it into a value between 0 and 1, but never exactly at those limits.

Where e is the base of the natural logarithms and value is the actual numerical value that you want to transform. Below is a plot of the numbers between -5 and 5 transformed into the range 0 and 1 using the logistic function.

Dataset:

This dataset is having only two attributes which are Credit Score & Approve. In this attribute, the credit score has independent and approve is the dependent attribute. Credit score has a different regression value and approved is an only binary value which 0 for not approved and 1 is for approved.

n = 1000

Credit Score is the applicant’s credit score

Approved is coded ‘1’ for approved and ‘0’ for not approved; it is a binary, mutually exclusive variable.

Note: only 15 shows of 1000 observations shown.

Solution:

Estimated regression equation

Where p the estimated probability of being approved and x1 is is a credit score.

Remember that your score was 720, so now we can calculate the estimated probability and odds you will be approved for a mortgage based on this data and model?

FICO of 720

So

FICO of 721

So

FICO of 655

All else being equal, if you apply for a mortgage having a 655 credit score, then the estimated probability of being approved is about 0.5595 or 55.95% according to model.

FICO of 745

All else being equal, if you apply for a mortgage having a 745 credit score, then the estimated probability of being approved is about 0.8258 or 82.58% according to model.

Odds Ratio for any 90 points FICO Increase

The odds ratio for 655 to 745 FICO score (+90)

The odds ratio for 600 to 690 FICO score (+90)

Similarly all Odds ratio and estimate probability as shown below the table

Effect of improving FICO Score

Similarly up to Hundred

Important point: This is the percentage increase in the odds, NOT the percentage increase in the probability of being approved.

Let’s Graph It!

Find ODDS Change for any interval

FICO for Even ODDS (50/50)

To have a 50/50 chance, or even odds for approving/ disapprove, you will need to have a FICO score of approximately 639 according to this model.

FICO for Even ODDS (.75/.25)

To have a 75% chance, or 3:1 for approve/disapprove, you will need to have a FICO score of approximately 714 according to this model.

Note: If you want this article check out my academia.edu profile.

3.Implementation of Logistic Regression in Scikit Learn.

Dataset Description:

In This part predicting the risk factor of cities, the dataset has three attributes cities distance and risk. The risk factor is the target attributes. In this part, we use the Logistic Regression techniques which are supervised machine learning algorithm which gave 91% correct prediction of the risk factor of cities.

Part 1: Data Preprocessing:

1.1 Import the Libraries

In this step, we import three Libraries in Data Preprocessing part. A library is a tool that you can use to make a specific job. First of all, we import the numpy library used for multidimensional array then import the pandas library used to import the dataset and in last we import matplotlib library used for plotting the graph.

1.2 Import the dataset

In this step, we import the dataset to do that we use the pandas library. After import our dataset we define our Predictor and target attribute. we call ‘X’ predictor here and target attribute which we call ‘y’ here.

1.3 Split the dataset for test and train

In this step, we split our dataset into a test set and train set and a 75% dataset split for training and the remaining 25% for tests.

1.4 Feature Scaling

Feature Scaling is the most important part of data preprocessing. If we see our dataset then some attribute contains information in Numeric value some value very high and some are very low if we see the age and estimated salary. This will cause some issues in our machinery model to solve that problem we set all values on the same scale there are two methods to solve that problem first one is Normalize and Second is Standard Scaler.

Here we use standard Scaler import from Sklearn Library.

Part 2: Building the Logistic Regression model:

In this part, we model our model using Scikit Learn Library.

2.1 Import the Libraries

In this step, we are building our model to do this first we import a model from Scikit Learn Library.

2.2 Initialize our Logistic Regression model

In this step, we initialize our Logistic Regression model

1.3 Fitting the Model

In this step, we fit the training data into our model X_train, y_train is our training data.

Part 3: Making the Prediction and Visualizing the result:

In this Part, we make a prediction of our test set dataset and visualizing the result using the matplotlib library.

3.1 Predict the test set Result

In this step, we predict our test set result.

3.2 Confusion Metric

In this step we make a confusion metric of our test set result to do that we import confusion matrix from sklearn.metrics then in confusion matrix, we pass two parameters first is y_test which is the actual test set result and second is y_pred which predicted result.

3.3 Accuracy Score

In this step, we calculate the accuracy score based on the actual test result and predict test results.

3.4 Visualize our Test Set Result

In the step, we visualize our test set result to do this we use a matplotlib library and we can see only 2 points which are the correct map in the graph according to the model test set result.

If you want dataset and code you also check my Github Profile.

End Notes:

If you liked this article, be sure to click ❤ below to recommend it and if you have any questions, leave a comment and I will do my best to answer.

For being more aware of the world of machine learning, follow me. It’s the best way to find out when I write more articles like this.

You can also follow me on Github for code & dataset follow on Aacademia.edu for this article, Twitter and Email me directly or find me on LinkedIn. I’d love to hear from you.

That’s all folks, Have a nice day :)

--

--