mAP (mean Average Precision) for Object Detection

mAP is the metric to measure the accuracy of object detectors like Faster R-CNN, SSD, etc. It is the average of the maximum precisions at different recall values. It sounds complicated but actually pretty simple as we illustrate it with an example. But before that, we will do a quick recap on precision, recall and IoU first.

Precision & recall

Precision measures how accurate is your predictions. i.e. the percentage of your positive predictions are correct.

Recall measures how good you find all the positives. For example, we can find 80% of the possible positive cases in our top K predictions.

Here are their mathematical definitions:

For example, in the testing for cancer:

IoU (Intersection over union)

IoU measures how much overlap between 2 regions, This measures how good is our prediction in the object detector with the ground truth (the real object boundary).

IoU definition


Let’s create a simple example to demonstrate the calculation of the average precision (AP). In our dataset, we have a total of 5 apples in the whole dataset. We collect all the predictions the model made for apples and rank it according to the predicted confidence level (from the highest confidence to the lowest). The second column indicates whether the prediction is correct or not. It is correct if it matches the ground truth and IoU ≥ 0.5.

Let’s compute the precision and recall value for the row with rank #3.

Precision is the proportion of TP = 2/3 = 0.67.

Recall is the proportion of TP out of the possible positives = 2/5 = 0.4.

The recall value increases as we include more predictions but the precision will go up and down. Let’s plot the precision against the recall:

The idea of AP can be conceptually viewed as finding the area under the precision-recall graph (the orange plot). But we approximate such calculation by smoothing out the zigzag pattern first.

We plot the graph with recall ȓ value at 0, 0.1, 0.2, …, 0.9 and 1.0 and we replace the precision value with the maximum precision for any recall ≥ ȓ.

Actually, it is much easier to visualize this in the plot. We find the highest precision value (the green curve) at or to the right side of the recall values (0, 0.1, 0.2, …, 0.9 and 1.0).

AP (average precision) is computed as the average of maximum precision at these 11 recall levels:

This is close to finding the total area under the green curve and divides it by 11. Here are the more precise definitions.

pinterp(0.7) is finding the maximum within the yellow box below:

In our example, AP = (5 × 1.0 + 4 × 0.57 + 2 × 0.5)/11

mAP is just the average over all classes. In many datasets, it is often called AP instead.

AP (Average Precision) in PASCAL VOC challenge

PASCAL VOC is a popular dataset for object detection. For the PASCAL VOC challenge, a prediction is positive if IoU > 0.5. However, if multiple detections of the same object are detected, it counts the first one as a positive while the rest as negatives. The mAP in PASCAL VOC is the same as AP we discussed.


Latest research papers tend to give results for the COCO dataset only. For COCO, AP is the average over multiple IoU (the minimum IoU to consider a positive match). AP@[.5:.95] corresponds to the average AP for IoU from 0.5 to 0.95 with a step size of 0.05. For the COCO competition, AP is the average over 10 IoU levels on 80 categories (AP@[.50:.05:.95]: start from 0.5 to 0.95 with a step size of 0.05).

Here is the AP result for the YOLOv3 detector.


mAP@.75 means the mAP with IoU=0.75.

More readings

Currently, we have 2 major types of deep network object detectors: region based and single shot. Here are 2 articles that elaborate more on these class of detectors:

This is the listing of articles covering object detection, GANs, self-driving car, reinforcement learning and meta-learning.