tensorflow 比较ML学习模型与CSV文件的输出

2mbi3lxu  于 2023-06-24  发布在  其他
关注(0)|答案(1)|浏览(100)

我在Google Colab上有一个机器学习模型,我有这个代码

wrong_english=[
    "I has a dog",
    "They is going to the park",
    "She don't like coffee.",
    "The book belong to him.",
    "He play soccer very well.",
    "My sister and me are going to the party.",
    "I'm not sure who's car is parked outside.",
    "There is many people in the room.",
    "She sings good.",
]

tokenized=tokenizer(
  wrong_english,
  padding="longest",
  max_length=MAX_LENGTH,
  truncation=True,
  return_tensors='tf'
)
out = model.generate(**tokenized, max_length=128)
print(out)

for i in range(len(wrong_english)):
  print(wrong_english[i]+"------------>"+tokenizer.decode(out[i], skip_special_tokens=True))

输出是这个

I has a dog------------>I have a dog.
They is going to the park------------>They are going to the park.
She don't like coffee.------------>She doesn't like coffee.
The book belong to him.------------>The book belongs to him.
He play soccer very well.------------>He plays soccer very well.
My sister and me are going to the party.------------>My sister and me are going to the party.
I'm not sure who's car is parked outside.------------>I'm not sure who's car is parked outside.
There is many people in the room.------------>There are many people in the room.
She sings good.------------>She sings good.

我也有一个csv文件,看起来像这一个

我如何将ML模型的输出与CSV文件的B列中的每个记录进行比较,并在值匹配时写入单词CORRECTINCORRECT
例如

I has a dog------------>I have a dog. -> CORRECT
They is going to the park------------>They are going to the park. -> CORRECT
She don't like coffee.------------>She doesn't like coffee. -> CORRECT
The book belong to him.------------>The book belongs to him. -> CORRECT
He play soccer very well.------------>He plays soccer very well. -> CORRECT
My sister and me are going to the party.------------>My sister and me are going to the party. -> CORRECT
I'm not sure who's car is parked outside.------------>I'm not sure who's car is parked outside. -> INCORRECT
There is many people in the room.------------>There are many people in the room. -> CORRECT
She sings good.------------>She sings good. -> CORRECT
mklgxw1f

mklgxw1f1#

# put your model predictions into a list
predicted_correction = [tokenizer.decode(out[i], skip_special_tokens=True) for i in range(len(wrong_english))]

# read your csv
df = pd.read_csv(CSV_PATH)
# add a correct/incorrect empty result column
df.insert(0,"Result","")

for wr, pc in zip(wrong_english, predicted_correction):
    indexes = df[df['Incorrect - input'] == wr].index
    for i in indexes:  # i'm not excluding the possibility of multiple occurences of the incorrect input
        if df.iloc[i]['Correct - expected output'] == pc:
            df.iloc[i]['Result'] = "Correct"
        else:
            df.iloc[i]['Result'] = "Incorrect"

相关问题