Multiclass Logistic Regression With Python

4 min readMar 4, 2022

Expanding our knowledge from binomial logistic regression to multinomial logistic regression

Photo by Federico Scarionati on Unsplash

Multiclass Logistic Regression

Multiclass logistic regression is also called multinomial logistic regression. In contrast to the binomial logistic regression, multiclass logistic regression is used to classify the output labels to more than 2 classes.

In the case of Multiclass Logistic Regression, we replace the sigmoid function with the softmax function :

Equation.1 Softmax Function. Image by **the Author.**

where we define y as

Equation. 2 Softmax input y. Image by **the Author.**

Now, this softmax function computes the probability of the feature x(i) belongs to class j. Given the weight and net input y(i). So, we compute the probability φ for each class label in j = 1, …, k. Note the normalization term in the denominator which causes these class probabilities to sum up to one.

Gradient Descent

Now, in order to train our logistic model via gradient descent, we need to define a cost function J that we want to minimize:

Equation. 3 cost function. Image by the Author.

where H is the cross-entropy function define as:

Equation. 4 cross-entropy. Image by the Author.

Here the y stands for the known labels and the φ stands for the computed probability via softmax; not the predicted class label.

In order to find optimum weights, we need the gradient of the cost function

Equation. 5 gradient of the cost function. Image by **the Author.**

or in the matrix form:

Equation. 6 gradient of the cost function. Image by the Author.

where

Θ= Gradient of the cost function

X=Matrix of features

Y=vector of known labels

Φ=vector of probability of unknown labels

We can add an L2 regularization term to the cost function

Equation. 6 gradient of the cost function with l2 regularization. Image by the Author.

where:

μ=regularization factor

W=weight matrix

Then, the updating steps of weight matrix written as:

Equation. 7 Updating steps of weights. Image by the Author.

where α is the learning rate. Note that w is the weight vector for the class y=j

Implementation

Now we will build the Logistic Regression using Python. Import the necessaries module

from sklearn.preprocessing import OneHotEncoder
from sklearn.datasets import load_iris
import numpy as np
import pandas as pd
from scipy.special import softmax
from sklearn.model_selection import train_test_split

Then define the class as

class MultipleLogRegression:
    
    def __init__(self,learning_rate=0.1,n_iters=1000):
        self.lr=learning_rate
        self.iters=n_iters
        self.W=None
        
    def fit(self,X,y,mu):
        ones=np.ones(X.shape[0])
        features=np.c_[ones,X]
        onehot_encoder = OneHotEncoder(sparse=False)
        y_encode=onehot_encoder.fit_transform(y.reshape(-1,1))
        self.W=np.zeros((features.shape[1], y_encode.shape[1]))
        samples=X.shape[0]
        
        for i in range(self.iters):
            Z=-features@self.W
            prob_y=softmax(Z,axis=1)
            error=y_encode-prob_y
            dW=1/samples * (features.T @ error) + 2 * mu * self.W
            self.W-=self.lr*dW
            
    def predict(self,X):
        ones=np.ones(X.shape[0])
        features=np.c_[ones,X]
        Z=-features@self.W
        y=softmax(Z,axis=1)
        return np.argmax(y,axis=1)

where:

__init__ is the standard constructor method with learning_rate and n_iters as the learning rate and number of iterations in gradient descent.
fit is the fit method. This is where the gradient descent will run. First we append one column of 1’s vector to the features matrix X . Then encode the known label y to the corresponding numeric class using fit_transform of the OneHotEncoder module from sklearn . Then in iteration we compute the probability of the labels using softmax function from the scipy module. Then compute the gradient of the cost function dWto be used in the updating steps of weight matrix W
predict is the prediction method. Simply compute the probability vector of the labels from the features matrix X using the softmax function. Then determined the maximum probability using np.argmax

Conclusion

In this article, we have learned:

The Multiclass Logistic Regression as a machine learning classifier algorithm for multiple class label.
Determined the probability of the output labels using the softmax function
Implementing the Gradient Descent on Multiclass Logistic Regression.

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com