我正在使用spacy来创建句子的向量。如果句子是“我在工作”,它给我一个形状向量(3,300)。有没有什么方法可以使用这些向量来返回句子中的文本?提前谢谢你,哈拉提
8wigbo561#
实际上,您可以使用.orth_ method直接从doc对象获取字符串,它返回令牌的字符串表示形式,而不是SpaCy令牌对象
import en_core_web_sm nlp = en_core_web_sm.load() tokenizer = nlp.Defaults.create_tokenizer(nlp) text = 'I am working' tokens = [token.orth_ for token in tokenizer(text)] print(tokens) ['I', 'am', 'working']
字符串
snz8szmq2#
没有办法从向量→词进行翻译。但是,您可以示例化第二个序列,该序列将标记序列Map到整数序列,该整数序列指示空间模型词汇表中每个标记的id。
sentence = 'I am working' document = nlp(sentence) id_sequence = map(lambda x: x.orth, [token for token in document]) text = map(lambda x: nlp.vocab[x].text, [id for id in id_sequence]) print(text) ['I', 'am', 'working']
jc3wubiy3#
你有没有试着查过“最相似”的单词?
nlp = spacy.load("en_core_web_lg") doc1 = nlp("I am working") # most_similar words in vocab keys, best_rows, scores = nlp.vocab.vectors.most_similar( np.asarray([ doc1.vector, # the input is 1x1 (x300) ]), n=20 ) # keys is 1xn (x300) for key, best_row, score in zip(keys[0, :], best_rows[0, :], scores[0, :]): print(f'text: {nlp.vocab[key].text}, score: {score}') # key: {key}
字符串它返回如下:
text: Am, score: 0.8314999938011169 text: aM, score: 0.8314999938011169 text: am, score: 0.8314999938011169 text: AM, score: 0.8314999938011169 text: I, score: 0.8113999962806702 text: i, score: 0.8113999962806702 text: İ, score: 0.8113999962806702 text: 'M, score: 0.7860000133514404 text: 'm, score: 0.7860000133514404 text: MYSELF, score: 0.7333999872207642 text: Myself, score: 0.7333999872207642 text: myself, score: 0.7333999872207642 text: WORKING, score: 0.7249000072479248 text: WOrking, score: 0.7249000072479248 text: working, score: 0.7249000072479248 text: Working, score: 0.7249000072479248 text: knOw, score: 0.7063999772071838 text: know, score: 0.7063999772071838 text: Know, score: 0.7063999772071838 text: KNow, score: 0.7063999772071838
型
3条答案
按热度按时间8wigbo561#
实际上,您可以使用.orth_ method直接从doc对象获取字符串,它返回令牌的字符串表示形式,而不是SpaCy令牌对象
字符串
snz8szmq2#
没有办法从向量→词进行翻译。但是,您可以示例化第二个序列,该序列将标记序列Map到整数序列,该整数序列指示空间模型词汇表中每个标记的id。
字符串
jc3wubiy3#
你有没有试着查过“最相似”的单词?
字符串
它返回如下:
型