# Regression Talks-III

*Logistic Regression* For Binary Classification

The last article we saw was about implementing the Linear Regression using gradient descent from scratch(i.e. without using the *sklearn* library). In this article, we are going to have look at another such algorithm. Are you excited? because I definitely am!

So a few weeks back, I was checking my mail on the weekend, relaxing and scrolling through each section: Inbox, Sent, and Draft. I decided to get rid of all the unnecessary emails and also out of curiosity checked the spam folder.

I was shocked when I saw that one important mail had landed in the spam folder, and the deadline was long gone. Well, why am I telling you this? Because I like making conversations through my writing, and also we are going to look at an algorithm that helps in classification.

# What is classification in ML?

Classification is the process of separating the data into different classes. In the email example, it is either *spam* or *not spam. *Does it always have to be only two options? Not at all, this is the most basic type of classification(binary classification) I am talking about. There is a multi-label classification that helps in classifying the data in two or more classes.

# Introducing Logistic Regression…

But in this article, we'll be looking at *Logistic Regression, *which is one of the go-to algorithms when it comes to binary classification.

According to Wikipedia, Logistic Regression is defined as, “the model that is used to model the probability of certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick. ” It basically predicts whether something is True or False, instead of predicting something continuous like size.

Remember for Linear Regression, we used the straight-line equation as our means for predicting the output. In this case, we’ll be using a special function called the sigmoid or logit function. It looks something like this:

Note:Don’t get confused if in any other article the above function is denoted by h(theta) or called hypotheses. It’s just a formal statistics term where they say, “Hey this is my hypothesis(predicted output) using this function.” Now if the hypothesis matches the actual output most of the times (called as accuracy of the model) then yay! we have our model ready.

So now you’ll guess that the cost function is different too? Yes, it looks like this:

But Bilwa, how did this cost function pop up out of nowhere? There’s something called as maximum likelihood method for fitting and that is used to derive this cost function, and you might also ask why not use MSE like before? Well, last time our hypothesis was *mx+c* and this time it’s *the sigmoid* function. SO if we use the sigmoid function in MSE then we’ll get a more complex function. Moreover, applying gradient descent to such a function won’t guarantee the global minimum as it itself has many local minima.

Now our aim as usual is to minimize the cost function using partial derivatives and update the values until we reach a good percentage of accuracy. I guess this is enough for us to go ahead with programming.

NOTE: I haven’t focused on the calculus part where we find the partial derivatives of the function with respect to each variables to update the values. We’ll be using the inbuilt optimization function from sklearn library.

# Let’s code…

A little bit of the background of the dataset we are using. It is data of students marks in 2 exams and also whether the student is admitted or not. Our aim is to use logistic regression to build such a classification model where I can just input exam marks and predict whether I’ll get admission or not.

`import numpy as np`

import pandas as pd

import matplotlib.pyplot as plt

from scipy.optimize import minimize,fmin_tnc

Importing all the relevant libraries in this cell.

`data=pd.read_csv('logreg1.txt', header=None, names=['% in Exam 1', '% in Exam 2', 'Admitted'])`

data.head()

`admitted=data[data['Admitted'].isin([1])]`

not_admitted=data[data['Admitted'].isin([0])]

In this cell, we divide the dataset into two parts where the output 1 implies that the student has gotten admission and 0 implies no admission. Now let us see how the data is spread across the plot.

`#plotting the scatter plot`

plt.scatter(admitted.iloc[:, 0], admitted.iloc[:, 1], s=10, label='Admitted')

plt.scatter(not_admitted.iloc[:, 0], not_admitted.iloc[:, 1], s=10, label='Not Admitted')

plt.legend()

plt.show()

`X = data.iloc[:, :-1]`

y = data.iloc[:, -1]

X = np.c_[np.ones((X.shape[0], 1)), X]

X[:10]

`y = y[:, np.newaxis]`

y[:10]

`def sigmoid(x, theta):`

z= np.dot(x, theta)

return 1/(1+np.exp(-z))

The above code cell has the sigmoid function code. z is a product of the input variable X and a randomly initialized coefficient theta.

`def hypothesis(theta, x):`

return sigmoid(x, theta)

This function will help us get our predictions!

`def cost_function(theta, x, y):`

m = X.shape[0]

h = hypothesis(theta, x)

return -(1/m)*np.sum(y*np.log(h) + (1-y)*np.log(1-h))

This is the function for the cost function we saw earlier.

`def gradient(theta, x, y):`

m = X.shape[0]

h = hypothesis(theta, x)

return (1/m) * np.dot(X.T, (h-y))

After doing the calculus part, this is the final gradient function we get and we define it using the above code.

`theta = np.zeros((X.shape[1], 1))`

def fit(x, y, theta):

opt_weights = fmin_tnc(func=cost_function, x0=theta, fprime=gradient, args=(x, y.flatten()))

return opt_weights[0]

parameters = fit(X, y, theta)

print(parameters)

`h = hypothesis(parameters, X)`

def predict(h):

h1 = []

for i in h:

if i>=0.5:

h1.append(1)

else:

h1.append(0)

return h1

y_pred = predict(h)

Since it is binary classification, we keep the threshold as 0.5 (** decision boundary** is the formal term) and classify whether the student is admitted or not admitted. Now let us check the accuracy of the model we have built!

`accuracy = 0`

for i in range(0, len(y_pred)):

if y_pred[i] == y[i]:

accuracy += 1

accuracy/len(y)

`x_values = [np.min(X[:, 1] - 5), np.max(X[:, 2] + 5)]`

y_values = - (parameters[0] + np.dot(parameters[1], x_values)) / parameters[2]

plt.scatter(admitted.iloc[:, 0], admitted.iloc[:, 1], s=10, label='Admitted')

plt.scatter(not_admitted.iloc[:, 0], not_admitted.iloc[:, 1], s=10, label='Not Admitted')

plt.plot(x_values, y_values, label='Decision Boundary')

plt.xlabel('Marks in 1st Exam')

plt.ylabel('Marks in 2nd Exam')

plt.legend()

plt.show()

The above code we saw was the implementation of Logistic Regression from scratch. If we want, we can also use the direct model from the *sklearn.linear_model. *Let’s check the code for that too! (Don’t worry, it’s a very small code).

# Code using sklearn library…

Note that we are using the variables X and y from the previous section.

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score model = LogisticRegression()

model.fit(X, y)

predicted_classes = model.predict(X)

accuracy = accuracy_score(y.flatten(),predicted_classes)

parameters = model.coef_print(parameters)

print(accuracy)

Bilwa, why was the accuracy 91% in the inbuilt function but 89% in our model? The inbuilt code uses the regularization techniques, that prevent overfitting of the data in the implemented model. Won’t tell much about regularization as it is a vast topic in itself, maybe in some other article :)

That’s all for this article, I guess. So to sum up we learned about what is logistic regression, what is the cost function for this algorithm, why we need such a cost function, how to code from scratch, and, finally how to directly implement the same using *the sklearn* Python library.

For any doubts, suggestions, and feedback, feel free to connect with me on LinkedIn and Twitter. Also if you haven’t checked Part 1 and Part 2 already, go check it out!

Don’t forget to follow The Lean Programmer Publication for more such articles, and subscribe to our newsletter tinyletter.com/TheLeanProgrammer