我试图计算的f1分数在评估我自己的测试集,但我不能解决,因为我是非常没有经验的.我已经尝试使用两个f1分数从Scikit-Learn和从torchmetrics,但他们给予我每次不同的错误.这是我的代码:
# Function to test the model
from sklearn.metrics import f1_score
since = time.time()
total=0
correct=0
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
y_pred=[]
y_true=[]
# Iterate over data.
with torch.no_grad():
for inputs, labels in dataloadersTest_dict['Test']:
inputs = inputs.to(device)
labels = labels.to(device)
#outputs = model(inputs)
predicted_outputs = model(inputs)
_, predicted = torch.max(predicted_outputs, 1)
total += labels.size(0)
print(total)
correct += (predicted == labels).sum().item()
print(correct)
#f1 score
temp_true=labels.numpy()
temp_pred=predicted.numpy()
y_true.append(temp_true.tolist())
y_pred.append(temp_pred.tolist())
time_elapsed = time.time() - since
test_acc=100 * correct / total
print('Evaluation completed in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
print('Accuracy: %d %%' % (test_acc))
print('F1 Score:')
f1=f1_score(y_true,y_pred, average = 'macro')
print(f1)
1条答案
按热度按时间byqmnocz1#
错误跟踪应该是可用的,以便发现问题,但我猜问题是由于传递一个嵌套列表到
f1_score
,而不是一个单一的列表。它必须通过改变最终列表的收集策略来修复。