假设我们有一个类似的模板句子:
- "The ____ house is our meeting place."
并且我们有一个形容词列表需要填充空白,例如:
- "yellow"
- "large"
- ""
注意其中一个是空字符串。
目标是在给定句子的上下文中比较选择最可能描述“house”的单词的概率。如果更有可能有 nothing
,这也应该被考虑进去。
我们可以预测每个单词填充空白的概率,但是如何预测一个空字符串填充空白的概率,即没有形容词描述“house”的概率?
要预测一个单词的概率:
from transformers import BertTokenizer, BertForMaskedLM
import torch
from torch.nn import functional as F
# Load BERT tokenizer and pre-trained model
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased')
model = BertForMaskedLM.from_pretrained('bert-large-uncased', return_dict=True)
targets = ["yellow", "large"]
sentence = "The [MASK] house is our meeting place."
# Using BERT, compute probability over its entire vocabulary, returning logits
input = tokenizer.encode_plus(sentence, return_tensors = "pt")
mask_index = torch.where(input["input_ids"][0] == tokenizer.mask_token_id)[0]
with torch.no_grad():
output = model(**input)
# Run softmax over the logits to get the probabilities
softmax = F.softmax(output.logits[0], dim=-1)
# Find the words' probabilities in this probability distribution
target_probabilities = {t: softmax[mask_index, tokenizer.vocab[t]].numpy()[0] for t in targets}
target_probabilities
这输出一个单词及其关联概率的列表:
{'yellow': 0.0061520976, 'large': 0.00071377633}
如果我尝试将空字符串添加到列表中,我得到以下错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-62-6f726220a108> in <module>
18
19 # Find the words' probabilities in this probability distribution
---> 20 target_probabilities = {t: softmax[mask_index, tokenizer.vocab[t]].numpy()[0] for t in targets}
21 target_probabilities
<ipython-input-62-6f726220a108> in <dictcomp>(.0)
18
19 # Find the words' probabilities in this probability distribution
---> 20 target_probabilities = {t: softmax[mask_index, tokenizer.vocab[t]].numpy()[0] for t in targets}
21 target_probabilities
KeyError: ''
这是因为BERT的词汇表中没有空字符串,所以我们无法查找模型中不存在的东西的概率。
我们应该如何获得没有单词填充空白的概率?使用模型是否可行?使用空标记 [PAD]
而不是空字符串是否有意义?(我只在句子末尾见过 [PAD]
,用于使一组句子具有相同的长度。)
1条答案
按热度按时间m1m5dgzv1#
你试过在列表中添加填充符'[PAD]'吗?