Multiclass Logistic Regression With Python
Expanding our knowledge from binomial logistic regression to multinomial logistic regression
Multiclass Logistic Regression
Multiclass logistic regression is also called multinomial logistic regression. In contrast to the binomial logistic regression, multiclass logistic regression is used to classify the output labels to more than 2 classes.
In the case of Multiclass Logistic Regression, we replace the sigmoid function with the softmax function :
where we define y as
Now, this softmax function computes the probability of the feature x(i) belongs to class j. Given the weight and net input y(i). So, we compute the probability φ for each class label in j = 1, …, k. Note the normalization term in the denominator which causes these class probabilities to sum up to one.
Gradient Descent
Now, in order to train our logistic model via gradient descent, we need to define a cost function J that we want to minimize:
where H is the cross-entropy function define as:
Here the y stands for the known labels and the φ stands for the computed probability via softmax; not the predicted class label.
In order to find optimum weights, we need the gradient of the cost function
or in the matrix form:
where
Θ= Gradient of the cost function
X=Matrix of features
Y=vector of known labels
Φ=vector of probability of unknown labels
We can add an L2 regularization term to the cost function
where:
μ=regularization factor
W=weight matrix
Then, the updating steps of weight matrix written as:
where α is the learning rate. Note that w is the weight vector for the class y=j
Implementation
Now we will build the Logistic Regression using Python. Import the necessaries module
from sklearn.preprocessing import OneHotEncoder
from sklearn.datasets import load_iris
import numpy as np
import pandas as pd
from scipy.special import softmax
from sklearn.model_selection import train_test_split
Then define the class as
class MultipleLogRegression:
def __init__(self,learning_rate=0.1,n_iters=1000):
self.lr=learning_rate
self.iters=n_iters
self.W=None
def fit(self,X,y,mu):
ones=np.ones(X.shape[0])
features=np.c_[ones,X]
onehot_encoder = OneHotEncoder(sparse=False)
y_encode=onehot_encoder.fit_transform(y.reshape(-1,1))
self.W=np.zeros((features.shape[1], y_encode.shape[1]))
samples=X.shape[0]
for i in range(self.iters):
Z=-features@self.W
prob_y=softmax(Z,axis=1)
error=y_encode-prob_y
dW=1/samples * (features.T @ error) + 2 * mu * self.W
self.W-=self.lr*dW
def predict(self,X):
ones=np.ones(X.shape[0])
features=np.c_[ones,X]
Z=-features@self.W
y=softmax(Z,axis=1)
return np.argmax(y,axis=1)
where:
__init__
is the standard constructor method withlearning_rate
andn_iters
as the learning rate and number of iterations in gradient descent.fit
is the fit method. This is where the gradient descent will run. First we append one column of 1’s vector to the features matrixX
. Then encode the known labely
to the corresponding numeric class usingfit_transform
of theOneHotEncoder
module fromsklearn
. Then in iteration we compute the probability of the labels usingsoftmax
function from thescipy
module. Then compute the gradient of the cost functiondW
to be used in the updating steps of weight matrixW
predict
is the prediction method. Simply compute the probability vector of the labels from the features matrixX
using thesoftmax
function. Then determined the maximum probability usingnp.argmax
Conclusion
In this article, we have learned:
- The Multiclass Logistic Regression as a machine learning classifier algorithm for multiple class label.
- Determined the probability of the output labels using the softmax function
- Implementing the Gradient Descent on Multiclass Logistic Regression.