MIoU Calculation

Computation of MIoU for Multiple-Class based Semantic Image Segmentation

CYBORG NITR
5 min readMay 9, 2020

There are several neural network models working on different platforms, and different unique approaches for object detection and semantic image segmentation, so we need to know how to choose one among all in order to have better results in our field. There has to be a criterion based on which such decision can be made. The best one is by checking the degree of similarity of the output produced by such methods with the ground truth and that can be done in a mathematical way by calculating IoU (Intersection over Union) between the two. This method takes into account the region common to both (ground truth and predicted output) and computes to what percentage it has similarity with the actual one.

It’s quite simple in case of “Single-class based Semantic Image Segmentation” but not in the case of other — “Multiple-class based Semantic Image Segmentation”, as in the case of Pascal VOC challenge (with 21 classes) where there can be objects belonging to different classes in the same image. In such cases each object has to be given a different label and has to be treated accordingly during IoU computation. Here in this article, we propose a method which can take into account such cases and find out the overall IoU for multiple classes present in an image. For that we need to find out mean value of IoUs corresponding to different classes which would match with the actual degree of similarity. This mean value is regarded as Mean IoU (MIoU).

Our approach of finding out MIoU:

For the calculation of MIoU we need the labelled matrix of both predicted result and expected one (ground truth). Then by going through a series of steps we end up reaching at the MIoU value.

Let us understand the process by considering a simple example.

Here are two matrices, one representing the actual segmented output and the other predicted by any neural network or model.

The elements of these matrices are the labels representing different classes to which pixels, at that particular location on the image belong.

Here, there are altogether 6 classes with labels ‘0’, ‘1’, ‘2’, ‘3’, ‘4’ and ‘5’, and the matrices are of 2D numpy type with size (4 x 4) each.

Now by going through the following steps we can calculate MIoU:

Step 1: Finding out the frequency count of each class for both the matrix.

This can be done using the “bincount” function available in the numpy package.

Step 2: Converting the matrix to 1D format.

This step is done for easy computation, which can be done by reshaping the numpy array.

Step 3: Finding out the category matrix.

Since there are 6 classes here, there can be 6 x 6 = 36 possibilities.

Like as, the 6th pixel actually belong to class ‘1’ but is predicted to be present in class ‘0’ and thus belonging to category ‘1–0’. Each such possibility corresponds to a category. The possible categories could be ‘0–0’, ‘0–1’, ‘0–2’, ….., ‘4–5’, ‘5–5’. They are numbered as per their index; category ‘0–0’ has got number 0, category ‘0–1’ got number 1, and so on.

The category matrix is one that will have the elements as the category numbers to which the pixels at that particular location belong.

Category = (number of classes x actual_1D) + pred_1D

Step 4: Constructing the confusion matrix.

A confusion matrix is a (no. of classes x no. of classes) size matrix which stores the information about the number of pixels belonging to a particular category.

The frequency count of the ‘category’ array gives a linear array which on reshaping to (6x6) gives us the confusion matrix.

The confusion matrix also stores some useful information which help in the calculation of IoU.

  • The diagonal of the confusion matrix represents the common region. So, these elements are the intersection values of the predicted output and ground truth.
  • The upper triangular part of confusion matrix represents those areas where actual matrix is true but the predicted one is false and the lower triangular part represents the opposite.

Step 5: Calculating IoU for individual classes.

I = diagonal elements of confusion matrix (CM_2D)

U = actual_count + pred_count - I

Step 6: Calculating MIoU for the actual-predicted pair.

It is found out using the ‘nanmean’ function available in numpy package. ‘Nanmean’ is preferred than ordinary mean to ignore the cases where individual IoU value may turn out to be ‘nan’ because of the absence of any particular class in an image.

MIoU = 0.8

Conclusion:

In case of multiple classes, the MIoU has to be calculate rather than just calculating IoU by treating all the different classes as a single one. Like in this case, with such invalid consideration one will end up getting 0.9524 approximately which is erroneous and thus it may create a false impression about the accuracy of a model. So, considering all the classes MIoU has to be calculated for validation.

There are several other methods to find out the similarity between an actual image and predicted result, most popular among them is the bounding box method but the instances involving fine edge detection along with segmentation where higher accuracy would be required, this method proves to be reliable.

This method of calculating MIoU was used in Disparting to prove it’s accuracy over other Semantic Image Segmentation algorithms based on Neural Networks.

Find the source code of Disparting and MIoU here.

--

--

CYBORG NITR

Cyborg, the robotics and automation club of National Institute of Technology, Rourkela, where we design intelligence and redefine technology.