Understanding Logistic Regression from Scratch!!

Niraj
5 min readAug 5, 2020

Logistic Regression is a technique that can be used for traditional statistics as well as machine learning. It predicts whether something is true of false instead of predicting something continuous.

Instead of fitting a line to the data, Logistic Regression fits an “S” shaped logistic function which is also known as Sigmoid function. The curve goes from 0 to 1 that means the curve tells us the probability of an occurrence.

Here the s-shaped curved is a logistic function or sigmoid function. The sigmoid function converts any value from negative infinity to positive infinity to discrete values. we can see the value lies in between 0 and 1 ie. binary values. Let’s understand with a simple example. If our value is y=0.8 as given in the figure, how can we decide our value lies in whether 0 or 1?. So, to do this we draw a threshold line which divides our line into equal half that is 0.5. Threshold value indicates the probability of something will happen or not. Now our value lies above threshold line so we can say that the result is 1. We can say that if our value lies above the threshold line the value is 1 on the other hand if our value lies below the threshold line the value is 0.

The above is the formula to calculate the Logistic Regression. Basically the values of logistic regression are categorical so it is used solve the classification problems.

Use cases of Logistic Regression

Some use cases of logistic regressions are:

  • Weather Prediction : Predicting a weather is an important task these days. whether it rain or not, what will be the temperature of tomorrow, cloudy or sunny and many more. Both the linear and logistic regression is used to forecast weather. So logistic regression is used to predict just yes/no values like will it rain or not, cloudy or sunny etc. But to predict the whether of tomorrow or next week we use linear regression.
  • Determine Illness: When any patient go to the hospital the doctor do various checks in the patient and looking on those data doctor will determine whether the patient is ill or not.
  • Classifying breeds: Using multi class classification logistic regression can classify which breed of animal is it.

Practical implementation of Logistic Regression

These are the five steps to implement any kinds of algorithm. The first stage of any problem begins with collecting the data. Data is the most important part of any data driven company. There are the various source of collecting. Then the next step is to analyzed the collected that whether it is related to our problem or not. Then there comes the data wrangling part which means cleaning our data i.e. we need to remove the unnecessary data or null values from our data. Now in the forth step we need to split our data using training and testing set. And the final step is to check our accuracy, how accurate our model is?.

To implement logistic regression practically i am using telco customer churn data set from kaggle. The data set consists of 7043 rows and 21 columns.

1.Collecting Data

Loading data from the kaggle using pandas read_csv() method.

Now lets see what are the columns and what data values do our dataset have.

These are some of our columns that we have in telco customer churn dataset. Now lets see the total names of our columns and structure of our data.

2.Analyzing Data

Using the plotly or matploltlib we can visualize our data into various plots.

Here I have plotted the rate of customer whether they have churn or not.

In this bar plot i have plotted Customer churn rate by the payment method.

3.Data Wrangling

Here the churn column is in object form so to convert it in the numerical form we can replace the values for yes and no.

We have replaced yes with 1 and no with 0 using pandas replace function.

In the above mentioned columns we have a value called “no internet service” that can be called as no. so we can replace that particular statement with no and replace with 1 and 0 as above.

Total charges column has come missing values and extra spaces. So the above lines of code removes the missing values and extra spaces.

4.Train/Test

train_test_split() method split the data into four parts X_train, X_test, y_train, y_test. Logistic Regression is our algorithm to model our dataset.

Assigning our features to X and labels to y.

Using the train_test_split() function to split our data into training and test sets. I have use 70% of our data for training our model and 30% for testing our model.

5.Accuracy Check

Using logistic regression we successfully build the model with 80% accuracy.

This is the in-depth implementation of Logistic Regression.

--

--

Niraj

Sometimes you win sometimes you lose. So don't let the fear take over.