python 将空间标记向量转换为文本

r1zk6ea1  于 2023-08-02  发布在  Python
关注(0)|答案(3)|浏览(135)

我正在使用spacy来创建句子的向量。如果句子是“我在工作”,它给我一个形状向量(3,300)。有没有什么方法可以使用这些向量来返回句子中的文本?
提前谢谢你,哈拉提

8wigbo56

8wigbo561#

实际上,您可以使用.orth_ method直接从doc对象获取字符串,它返回令牌的字符串表示形式,而不是SpaCy令牌对象

import en_core_web_sm
nlp = en_core_web_sm.load()
tokenizer = nlp.Defaults.create_tokenizer(nlp)
text = 'I am working'
tokens = [token.orth_ for token in tokenizer(text)]
print(tokens)
['I', 'am', 'working']

字符串

snz8szmq

snz8szmq2#

没有办法从向量→词进行翻译。但是,您可以示例化第二个序列,该序列将标记序列Map到整数序列,该整数序列指示空间模型词汇表中每个标记的id。

sentence = 'I am working'
document = nlp(sentence)
id_sequence = map(lambda x: x.orth, [token for token in document])
text = map(lambda x: nlp.vocab[x].text, [id for id in id_sequence])
print(text)
['I', 'am', 'working']

字符串

jc3wubiy

jc3wubiy3#

你有没有试着查过“最相似”的单词?

nlp = spacy.load("en_core_web_lg")
    doc1 = nlp("I am working")
    # most_similar words in vocab
    keys, best_rows, scores = nlp.vocab.vectors.most_similar(
        np.asarray([
            doc1.vector,  # the input is 1x1 (x300)
            ]),
        n=20
        )
    # keys is 1xn (x300)
    for key, best_row, score in zip(keys[0, :], best_rows[0, :], scores[0, :]):
        print(f'text: {nlp.vocab[key].text}, score: {score}')  # key: {key}

字符串
它返回如下:

text: Am, score: 0.8314999938011169
text: aM, score: 0.8314999938011169
text: am, score: 0.8314999938011169
text: AM, score: 0.8314999938011169
text: I, score: 0.8113999962806702
text: i, score: 0.8113999962806702
text: İ, score: 0.8113999962806702
text: 'M, score: 0.7860000133514404
text: 'm, score: 0.7860000133514404
text: MYSELF, score: 0.7333999872207642
text: Myself, score: 0.7333999872207642
text: myself, score: 0.7333999872207642
text: WORKING, score: 0.7249000072479248
text: WOrking, score: 0.7249000072479248
text: working, score: 0.7249000072479248
text: Working, score: 0.7249000072479248
text: knOw, score: 0.7063999772071838
text: know, score: 0.7063999772071838
text: Know, score: 0.7063999772071838
text: KNow, score: 0.7063999772071838

相关问题