numpy 目标检测的平均精度实现-低置信度检测不影响分数

jxct1oxe 于 2023-10-19 发布在其他

关注(0)|答案(2)|浏览(156)

我有下面的代码来计算对象检测任务的精确率-召回率曲线，其中检测首先通过从具有最高置信度得分的检测开始创建1对1对，并将其匹配到重叠最高的地面实况对象来匹配地面实况。结果存储在detections_matches向量中，如果检测与某些地面实况对象匹配，则值为True，否则为False。然后，此PR曲线用于计算平均精度分数。

def precision_recall_curve(
    detection_matches: np.ndarray, detection_scores: np.ndarray, total_ground_truths: int
):

    sorted_detection_indices = np.argsort(detection_scores, kind="stable")[::-1]
    detection_scores = detection_scores[sorted_detection_indices]
    detection_matches = detection_matches[sorted_detection_indices]

    threshold_indices = np.r_[np.where(np.diff(detection_scores))[0], detection_matches.size - 1]
    confidence_thresholds = detection_scores[threshold_indices]

    true_positives = np.cumsum(detection_matches)[threshold_indices]
    false_positives = np.cumsum(~detection_matches)[threshold_indices]

    precision = true_positives / (true_positives + false_positives)
    precision[np.isnan(precision)] = 0
    recall = true_positives / total_ground_truths

    full_recall_idx = true_positives.searchsorted(true_positives[-1])
    reversed_slice = slice(full_recall_idx, None, -1)

    return np.r_[precision[reversed_slice], 1], np.r_[recall[reversed_slice], 0]

def ap_score(precision, recall):
    return -np.sum(np.diff(recall) * np.array(precision)[:-1])

这可用于计算示例向量的AP得分：

detection_matches = np.array([True, True, True, True, True, True, False, True])
detection_scores = np.array([0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55])
total_ground_truths = 10

precision, recall = precision_recall_curve(detection_matches, detection_scores, total_ground_truths)
# (array([0.875     , 0.85714286, 1.        , 1.        , 1.        ,
#         1.        , 1.        , 1.        , 1.        ]),
#  array([0.7, 0.6, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0. ]))

ap_score(precision, recall)
# 0.6875

然而，添加更多的检测，即使是超低置信度也会增加AP分数，这似乎不正确。

detection_matches = np.array([True, True, True, True, True, True, False, True, True, False, False, False, False, False, False])
detection_scores = np.array([0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, 0.04, 0.03, 0.02, 0.015, 0.012, 0.011, 0.01])
total_ground_truths = 10

precision, recall = precision_recall_curve(detection_matches, detection_scores, total_ground_truths)
# (array([0.88888889, 0.875     , 0.85714286, 1.        , 1.        ,
#         1.        , 1.        , 1.        , 1.        , 1.        ]),
#  array([0.8, 0.7, 0.6, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0. ]))

ap_score(precision, recall)
# 0.7763888888888889

我可以看到这是因为精度向量（array([1., 1., 1., 1., 1., 1., 0.85714286, 0.875, 0.88888889, 0.8, 0.72727273, 0.66666667, 0.61538462, 0.57142857, 0.53333333])）的低精度分数被有效地忽略了，因为精度和召回率都在召回率达到全值的索引处被修剪。然而，即使我们不修剪，召回率也是恒定的，因此召回率的差异为0，因此无论如何都不会考虑低精度分数。
这个实现中有bug吗？如果是这样，应如何调整以使低精度评分对AP评分产生（负面）影响？或者是AP分数不直观的情况下？

numpy

来源：https://stackoverflow.com/questions/77162144/average-precision-implementation-for-object-detection-low-confidence-detection

2条答案

按热度按时间

wpcxdonn1#

1.在执行过程中是否存在bug？

不，也许有些过于简单化 *

我认为首先要理解的是，平均精度（AP）得分受到精度和召回率的影响。这个名字有点误导。
但你在这里实际计算的结果说明了一切：

def ap_score(precision, recall):
    return -np.sum(np.diff(recall) * np.array(precision)[:-1])

你基本上是在计算查准率与查全率曲线的曲线下面积，而查全率值的差异仅近似于当你将查准率值乘以该宽度时形成的“矩形”的宽度（0.1）。
然而，当你检查条目的数量时，你的直觉会受到影响，这就是魔术发生的地方。
当你通过简单地对“矩形”求和来求平均值时（依赖于召回值之间的宽度为0.1），每个矩形都很重要。由于第二个对象检测器配置有一个额外的矩形，因为你的信心截止，这付出了很大的时间。
1.应进行哪些调整以使低精度分数对AP分数产生（负面）影响？

你所观察到的隆起也可能随着更多的检测而消失。*

您正在用很少的数据检查这段代码的合理性。通常，该曲线依赖于至少数百次检测，以正确了解对象检测器的性能。此外，对于特定召回值具有多个条目或没有条目的问题可能会自行解决。
如果你想有一个惩罚精度更多的指标，只需为此编写一个指标。例如，您可以简单地从查看平均精度开始：

avg_precision = np.mean(precision)
 # First example: 0.97
 # Second example: 0.95

看起来更像你所期望看到的，我猜。当分析和评估一个物体探测器（或任何系统）时，没有一个指标可以告诉你系统的每一个特征。
希望我能帮上忙。干杯

注意：* 注意你的代码不会处理重复的召回值（例如：0.6），其仅被计算为单独的“矩形”。这是经常在这种近似中完成的，你想如何处理，这取决于你自己的决定。一个简单的解决方案是平均该“矩形”的精度值。

赞(0）回复(0）举报 2023-10-19

csbfibhn2#

我觉得你写的精确-召回曲线的函数里发生了一些奇怪的事情。我将你的曲线与sklearn.metrics.precision_recall_curve进行了比较，结果如下：

detection_matches = np.array([True, True, True, True, True, True, False, True])
detection_scores = np.array([0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55])
total_ground_truths = 10

detection_matches = np.array([True, True, True, True, True, True, False, True, True, False, False, False, False, False, False])
detection_scores = np.array([0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, 0.04, 0.03, 0.02, 0.015, 0.012, 0.011, 0.01])
total_ground_truths = 10

这是一种期望的行为吗？你认为这条曲线会与原来的曲线不同吗？total_ground_truths是否扮演了一个我看不到的关键角色？
无论如何，我同意@mrk关于过度简化的观点，我重写了函数来简化它们：

def my_precision_recall_curve(
    detection_matches: np.ndarray, detection_scores: np.ndarray, total_ground_truths: int
):

    sorted_detection_indices = np.argsort(detection_scores, kind="stable")[::-1]
    detection_scores = detection_scores[sorted_detection_indices]
    detection_matches = detection_matches[sorted_detection_indices]

    positives = detection_scores>=detection_scores[:,None]
    negatives = ~positives
    
    true_positives = positives[:, detection_matches].sum(axis=1)
    false_positives = positives[:, ~detection_matches].sum(axis=1)
    false_negatives = (negatives[:, detection_matches]).sum(axis=1)
    
    precision = true_positives/(true_positives+false_positives)
    recall = true_positives/(true_positives+false_negatives)

    return np.concatenate([[1], precision]), np.concatenate([[0], recall])

def my_ap_score(precision, recall):
    return (np.diff(recall) * np.array(precision)[:-1]).sum()

我的曲线和分数与你做的两个例子的sklearn重叠：