ludwig 将训练工具添加到二元分类中，以找到更优的阈值,

6fe3ivhb 于 5个月前发布在其他

关注(0)|答案(2)|浏览(81)

Ludwig使用默认阈值0.5来计算二分类问题的准确率。然而，对于不平衡的数据集，特别是阈值为0.5的可能性非常高。AUC(Area Under the Curve)衡量了二分类器在所有可能的决策阈值上的性能，通常用于确定一个更好的阈值，以获得更好的精确度和召回率平衡。

@geoffreyangus 和 @w4nderlust 提出的这样一个算法大纲：

def find_best_threshold(model, output_feature_name, dataset, metric, thresholds:  range(0, 1, 0.05)):
  probabilities = model.predict(dataset)[output_feature_name]['probabilities']
  scores = []
  for threshold in thresholds:
    preds = probabilities[:, 1] > threshold
    metric_score = metric(preds, targets)  # TODO: extract targets from `dataset`
    scores.append(metric_score)
  return threshold[argmax(scores)]

默认情况下，最优阈值应在训练阶段结束时计算。
将其作为独立的API暴露出来也是很有用的。

ludwig

来源：https://github.com/ludwig-ai/ludwig/issues/2181