VotingClassifier

Ranjan kumar
3 min readAug 12, 2023

--

A Voting Classifier is an ensemble machine learning technique that combines the predictions from multiple individual classifiers (also known as base classifiers or estimators) to make a final prediction. It’s a type of model averaging approach where each base classifier contributes its prediction, and the final prediction is determined by a majority vote (for classification) or an average (for regression). The Voting Classifier can be used for both binary and multiclass classification tasks.

Types of Voting Classifiers:

There are mainly two types of Voting Classifiers:

  1. Hard Voting: In hard voting, each base classifier’s prediction is treated as a vote, and the final prediction is the majority vote among the predictions of the individual classifiers. This is commonly used for classification tasks.
  2. Soft Voting: In soft voting, each base classifier’s predicted probabilities for each class are averaged, and the class with the highest average probability is chosen as the final prediction. Soft voting often produces better results than hard voting because it takes into account the confidence levels of the classifiers.

Advantages of Voting Classifier:

  • Improved Generalization: A Voting Classifier can potentially improve the overall generalization and robustness of the model by combining the strengths of multiple individual classifiers.
  • Reduced Overfitting: By aggregating predictions from diverse models, the Voting Classifier may help reduce overfitting, as it’s less likely that all base classifiers will overfit in the same way.
  • Handles Different Model Biases: Voting can help mitigate the biases and weaknesses of individual models by considering their collective decisions.

Example Usage:

Suppose you have three base classifiers: a Decision Tree, a Support Vector Machine, and a Logistic Regression model. You can create a Voting Classifier to combine their predictions and make a final prediction.

Steps to Implement a Voting Classifier:

  1. Import the necessary libraries (e.g., scikit-learn in Python).
  2. Initialize and configure the base classifiers you want to use.
  3. Create a Voting Classifier instance, specifying whether you want to use hard or soft voting.
  4. Fit the Voting Classifier on your training data.
  5. Use the trained Voting Classifier to make predictions on new data.

Code Example (Python — scikit-learn):

import matplotlib.pyplot as plt
import numpy as np

from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB

clf1 = LogisticRegression(max_iter=1000, random_state=123)
clf2 = RandomForestClassifier(n_estimators=100, random_state=123)
clf3 = GaussianNB()
X = np.array([[-1.0, -1.0], [-1.2, -1.4], [-3.4, -2.2], [1.1, 1.2]])
y = np.array([1, 1, 2, 2])

eclf = VotingClassifier(
estimators=[("lr", clf1), ("rf", clf2), ("gnb", clf3)],
voting="soft",
weights=[1, 1, 5],
)

# predict class probabilities for all classifiers
probas = [c.fit(X, y).predict_proba(X) for c in (clf1, clf2, clf3, eclf)]

# get class probabilities for the first sample in the dataset
class1_1 = [pr[0, 0] for pr in probas]
class2_1 = [pr[0, 1] for pr in probas]


# plotting

N = 4 # number of groups
ind = np.arange(N) # group positions
width = 0.35 # bar width

fig, ax = plt.subplots()

# bars for classifier 1-3
p1 = ax.bar(ind, np.hstack(([class1_1[:-1], [0]])), width, color="green", edgecolor="k")
p2 = ax.bar(
ind + width,
np.hstack(([class2_1[:-1], [0]])),
width,
color="lightgreen",
edgecolor="k",
)

# bars for VotingClassifier
p3 = ax.bar(ind, [0, 0, 0, class1_1[-1]], width, color="blue", edgecolor="k")
p4 = ax.bar(
ind + width, [0, 0, 0, class2_1[-1]], width, color="steelblue", edgecolor="k"
)

# plot annotations
plt.axvline(2.8, color="k", linestyle="dashed")
ax.set_xticks(ind + width)
ax.set_xticklabels(
[
"LogisticRegression\nweight 1",
"GaussianNB\nweight 1",
"RandomForestClassifier\nweight 5",
"VotingClassifier\n(average probabilities)",
],
rotation=40,
ha="right",
)
plt.ylim([0, 1])
plt.title("Class probabilities for sample 1 by different classifiers")
plt.legend([p1[0], p2[0]], ["class 1", "class 2"], loc="upper left")
plt.tight_layout()
plt.show()

Voting Classifiers are useful when you have multiple models with varying strengths and weaknesses. By combining their predictions, you can often achieve better overall performance compared to using any single model alone.

link : sicit-learn-library
github link : https://github.com/Ranjan4Kumar/Plot-class-probabilities-calculated-by-the-VotingClassifier

--

--