BERTopic 运行时错误：Tensora的大小(678)必须与Tensorb的大小(512)在非单例维度1上匹配,

nukf8bse 于 3个月前发布在其他

关注(0)|答案(3)|浏览(57)

你好！我正在使用管道从huggingface加载BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext模型，但似乎出了问题。以下是我的代码和错误。请帮我解决。

embedding_model = pipeline("feature-extraction", model=r"D:\A-编程\生物医药\BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext")
topic_model = BERTopic(
 language='english',
 embedding_model=embedding_model,
 #verbose=True
)
topics,probilities=topic_model.fit_transform(df_value['text'].tolist())
RuntimeError: The size of tensor a (678) must match the size of tensor b (512) at non-singleton dimension 1

翻译结果：

嗨！我正在使用管道从huggingface加载BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext模型，但似乎出了问题。以下是我的代码和错误。请帮我解决。

BERTopic

来源：https://github.com/MaartenGr/BERTopic/issues/1852

3条答案

按热度按时间

pu82cl6c1#

你能分享一下你的完整错误信息吗？没有它，很难确定错误来自哪里。

赞(0）回复(0）举报 3个月前

oaxa6hgo2#

从错误信息来看，问题出在模型的词嵌入层。错误提示“The size of tensor a (678) must match the size of tensor b (512) at non-singleton dimension 1...”，意味着输入Tensora的大小为678,而词嵌入层期望的大小为512。这可能是因为预处理过程中文本长度不一致导致的。

为了解决这个问题，你可以尝试以下方法：

确保所有文本在预处理阶段具有相同的长度。你可以使用torch.nn.utils.rnn.pad_sequence函数对文本进行填充，使它们具有相同的长度。例如：

from torch.nn.utils.rnn import pad_sequence

# 对文本进行填充

padded_texts = pad_sequence(df_value['text'].tolist(), batch_first=True, padding_value=0)

在将填充后的文本传递给模型之前，确保将其转换为适当的Tensor形状。例如，如果你使用的是BERT模型，你需要将文本转换为token ids,然后将token ids转换为词嵌入向量。你可以使用BertTokenizer和BertModel来实现这一点。例如：

from transformers import BertTokenizer, BertModel
import torch

# 初始化tokenizer和model

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

# 将文本转换为token ids

input_ids = tokenizer(padded_texts, return_tensors='pt', padding=True, truncation=True)['input_ids']

# 将token ids转换为词嵌入向量

with torch.no_grad():
    embeddings = model(input_ids)[0]

确保在训练过程中使用正确的批次大小。如果某些批次的文本长度不同，你可能需要调整批次大小以确保所有文本都能被正确处理。
原始邮件

发件人： "Maarten ***@***.***>",发送时间： 2024年3月3日(星期天)晚上7:09
收件人： "***@***.***>",抄送： "***@***.***>", "***@***.***>",主题： "Re: [MaartenGr/BERTopic] RuntimeError: The size of tensor a (678) must match the size of tensor b (512) at non-singleton dimension 1 (Issue #1852) Could you perhaps share your full error message? It is difficult to say where it originates from without it." — 回复此电子邮件，直接查看GitHub上的版本，或取消订阅。您收到此邮件是因为您创建了该主题。消息ID: "***@***.***"

赞(0）回复(0）举报 3个月前

tcomlyy63#

感谢lucgyn分享错误信息，但由于有太多的 标签，错误信息难以阅读。你能分享一下控制台中出现的完整错误信息吗？

赞(0）回复(0）举报 3个月前