-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
I'm not sure if this is the right place to report issues with https://github.com/ppwwyyxx/cocoapi -- that repo doesn't have its own Issues tab, so I'm opening an issue here instead.
I'm confused by how pycocotools calculates average precision and recall metrics reported in the summary. I'm not sure if it's actually a bug, or if I'm just fundamentally misunderstanding how these calculations are being done under the hood. So, I wrote out a super simple test case, just taking two bboxes with perfect overlap and passing them into COCOeval:
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
actual_boxes = [[50, 50, 150, 150], [200, 200, 300, 300]]
predicted_boxes = [[50, 50, 150, 150], [200, 200, 300, 300]]
scores = [1.0, 1.0]
coco_actual = COCO()
coco_predicted = COCO()
actual_annotations_list = []
predicted_annotations_list = []
for id, box in enumerate(actual_boxes):
actual_annotations_list.append({
"id": id,
"image_id": 1,
"category_id": 1,
"bbox": [box[0], box[1], box[2] - box[0], box[3] - box[1]],
"area": (box[2] - box[0]) * (box[3] - box[1]),
"iscrowd": 0,
})
for id, box in enumerate(predicted_boxes):
predicted_annotations_list.append({
"id": id,
"image_id": 1,
"category_id": 1,
"bbox": [box[0], box[1], box[2] - box[0], box[3] - box[1]],
"area": (box[2] - box[0]) * (box[3] - box[1]),
"iscrowd": 0,
"score": scores[id],
})
coco_actual.dataset = {
"images": [{"id": 1}],
"annotations": actual_annotations_list,
"categories": [{"id": 1, "name": "object"}],
}
coco_actual.createIndex()
coco_predicted.dataset = {
"images": [{"id": 1}],
"annotations": predicted_annotations_list,
"categories": [{"id": 1, "name": "object"}],
}
coco_predicted.createIndex()
coco_eval = COCOeval(coco_actual, coco_predicted, iouType="bbox")
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()Here is the output:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.252
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.252
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.252
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.252
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.500
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.500
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.500
I believe these are considered "large", and the summary shows AP=0.252 and AR=0.500. These numbers do not make sense to me. Actual and predicted are 100% identical here, so we'd expect average precision and recall to both be 1.0, right? Am I misunderstanding something, or is there a bug in how these metrics are calculated?