我如何获得pytorch中手套嵌入的单词索引

yzckvree 于 2023-01-13 发布在其他

关注(0)|答案(2)|浏览(157)

我正在尝试使用pytorch中的手套嵌入来在模型中使用。我有以下代码：

from torchtext.vocab import GloVe
import torch.nn
glove= GloVe()
my_embeddings = torch.nn.Embedding.from_pretrained(glove.vectors,freeze=True)

但是，我不明白如何从这个函数中获取特定单词的嵌入。my_embeddings只接受pytorch索引而不是文本用途：

from torchtext.data import get_tokenizer
tokenizer = get_tokenizer("basic_english")
glove.get_vecs_by_tokens(tokenizer("Hello, How are you?"))

但随后我感到困惑，为什么我需要使用torch.nn.Embedding在所有大多数教程建议我这样做？

pytorch

来源：https://stackoverflow.com/questions/73403476/how-do-i-get-word-indexes-for-glove-embeddings-in-pytorch

2条答案

按热度按时间

mwg9r5ms1#

我相信这是用glove.stoi完成的：

sentence = "Hello, How are you?"
tokenized_sentence = tokenizer(sentence)
torch_tensor_first_word = torch.tensor(glove.stoi[tokenized_sentence[0]], dtype=torch.long)
embeddings_for_first_word = my_embeddings(torch_tensor_first_word)

赞(0）回复(0）举报 2023-01-13

hfwmuf9z2#

问得好!
以下是您可能希望手套向量位于nn.Embedding层中的一些原因：
1.您需要微调手套提供的预先训练的嵌入权重;
1.您希望传递要嵌入到模型中的可训练索引批;
正如前面提到的，可以通过glove.stoi[word_str]传入单词串（标记）列表。
但是您设置了freeze=True，所以，如果您不打算重新训练嵌入层，那么您最好使用：

- 单词-〉向量**

from torchtext.vocab import GloVe
import torch.nn
glove= GloVe()
test_text = ["Hello", "world"]
embedded_text=glove.get_vecs_by_tokens(test_text)

- 向量-〉索引**

def emb2indices(vec_seq, vecs): # vec_seq is size: [sequence, emb_length], vecs is size: [num_indices, emb_length]
    with torch.no_grad():
        vs_new_size=-1, vecs.size(0), -1
        vec_new_size=vec_seq.size(0), -1, -1
        word_indices = torch.argmin(torch.abs(vec_seq.unsqueeze(1).expand(vs_new_size)- vecs.unsqueeze(0).expand(vec_new_size)).sum(dim=2),dim=1)
    return word_indices
word_indices=emb2indices(embedded_text, glove.vectors)

- 索引-〉单词**

list_words=[glove.itos[word_indices[i]] for i in range(word_indices.size(0))]

展开查看全部

赞(0）回复(0）举报 2023-01-13

我来回答

我如何获得pytorch中手套嵌入的单词索引

2条答案

相关问题

热门标签

最新问答