How to Generalize a Multi-Class Confusion Matrix

The application to generalize any multi-class confusion matrix in python

Mehmet Emin Yıldırım
Analytics Vidhya
3 min readFeb 3, 2020

--

1. Introduction

Sometimes, we need to deal with multiple classes (labels) in machine learning projects and we need to plot confusion matrix for these multiple classes.

Unfortunately, as the number of classes increases, the confusion matrix becomes harder to perceive like in the example below:

Original Confusion Matrix

2. Idea

If we generalize the confusion matrix by grouping the classes, we will have a confusion matrix easier to perceive like in the figure below:

Generalized Confusion Matrix

3. Important Facts

Confusion matrix generalizing algorithm should sum only true cells (values) from the original confusion matrix and should write the result to the related true cell in the generalized confusion matrix:

The algorithm also needs to distribute the false values to the neighbor row(s) in the generalized confusion matrix for preserving actual accuracy.

Note: The generalized confusion matrix naturally has errors because of distributing the false values to the neighbor row(s) but, this operation preserves the actual accuracy.

Note: We can use prime factorization to find out how many groups we can separate classes into.

  • For example: if we have a 51x51 confusion matrix, we can separate classes into 3 or 17 groups and the generalized confusion matrix will be 3x3 (group size is 17) or 17x17 (group size is 3)

4. The Code

# original confusion matrix
[[5 1 0 0 0 0]
[0 5 0 0 0 0]
[0 1 5 1 0 0]
[0 0 0 5 0 0]
[0 0 0 0 5 0]
[0 0 1 0 0 5]]
# generalized confusion matrix with 3 groups
[[10. 1. 0.]
[ 1. 10. 1.]
[ 0. 1. 10.]]

--

--