Ensemble model : Data Visualization

Shivani Parekh
Analytics Vidhya
Published in
4 min readOct 9, 2020
Photo by William Iven on Unsplash

So this is part 2 of my previous article (Ensemble Modelling- How to perform in python). Checkout my previous article for better understanding of this one. Thank you 😊.

So, Lets start with this tutorial of visualizing different models and comparing their accuracies. Here we have taken KNN, Decision Tree and SVM models.

Lets recall that in previous article we had used “accuracys” named list to store accuracy of above mentioned models respectively .

Let us see what it contains.

NOTE: We have not used a train and test set seperately here , we are using train_test_split() due to which everytime we use this split function the train and test gets splitted at a random point. So accuracy will keep changing depending upon train and test set values.

Now model_names is another empty which will be containing names of model this list will help us to plot better.

model_names=[] #empty list
for name, model in estimators:
model_names.append(name)

Plotting Bar Plot

import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
ax.bar(model_names,accuracies)
plt.yticks(np.arange(0, 1, .10))
plt.show()

The line ax.bar() function creates the bar plot, here we have given model_names as X and accuracies as height. Various other parameters can also be mentioned such as width, bottom, align.

We can even compare accuracy of ensemble model by adding respective name and accuracy of ensemble model to model_names and accuracies lists using below code and run the above code again.

#adding accuracy of ensemble for comparisonif “Ensemble” not in model_names:
model_names.append(“Ensemble”)
if ensem_acc not in accuracys:
accuracys.append(ensem_acc)

As we can easily see that ensemble out of all has highest accuracy, and if we compare more closely than we can see that SVM model gave lowest accuracy.

Let’s see a how can we plot a box plot now.

Here we are using kfold cross-validation for splitting up the data and testing the model accuracy. We are going to obtain multiple accuracy of each model.

Here we have splitted data into 15 splits so it will break the data into 15 sets and test the model 15 times, so 15 different accuracy will be obtained. Finally we are taking the mean of this accuracy to know what is average accuracy of the model.

acc=[] #empty list
names1=[]
scoring = ‘accuracy’
#here creating a list "acc" for storing multiple accuracies of each model.
for name, model in estimators:
kfold=model_selection.KFold(n_splits=15)
res=model_selection.cross_val_score(model,X,target,cv=kfold,scoring=scoring)
acc.append(res)
names1.append(name)
model_accuracy = “%s: %f” % (name,res.mean())
print(model_accuracy)

For clarity of my point , lets see what “acc” list has !

Plotting Box Plot

blue_outlier = dict(markerfacecolor=’b’, marker=’D’)
fig = plt.figure()
fig.suptitle(‘Algorithm Comparison’)
ax = fig.add_subplot(111)
plt.boxplot(acc,flierprops=blue_outlier)
ax.set_xticklabels(names1)
plt.show()

These blue colored dots are outliers. The line extending the box is whiskers , horizontal orange lines are medians.

k_folds = model_selection.KFold(n_splits=15, random_state=12)
ensemb_acc = model_selection.cross_val_score(ensemble, X_train, target_train, cv=k_folds)
print(ensemb_acc.mean())
if “Ensemble” not in names1:
names1.append(“Ensemble”)
from numpy import array, array_equal, allclose

def arr(myarr, list_arrays):
return next((True for item in list_arrays if item.size == myarr.size and allclose(item, myarr)), False)
print(arr(ensemb_acc, acc))if arr(ensemb_acc, acc)==False:
acc.append(ensemb_acc)
acc

Now , by running the above code for plotting the box plot again, we get

You can even customise you boxplot using different parameters, such as patch_artist= True will display boxplot with colors , notch=True displays a notch format to boxplot, vert=0 will display horizontal boxplot.

Here is the entire code:

Link for the code from previous article:

https://medium.com/analytics-vidhya/ensemble-modelling-in-a-simple-way-386b6cbaf913

I hope you liked my article 😃. If you find this helpful then it would be really nice to see you appreciate my hard work by clapping for me 👏👏. Thank you.

--

--

Shivani Parekh
Analytics Vidhya

I write on data science and analytics, visualizations , new technologies and tools. 🎇Aiming to improve my writing with each article I post.