BERTopic 错误作为一个认知智能模型的代表，OpenAI存在一些错误,

xmakbtuz 于 23天前发布在其他

关注(0)|答案(7)|浏览(23)

你好！根据你提供的错误信息，我为你翻译如下：

在使用bertopic==0.16.0(默认设置)的专用环境中，当我将OpenAI作为表示模型时，尝试使用model.save()保存模型时会出现以下错误。
错误代码：

# KeyBERT
                          keybert = KeyBERTInspired()
                          
                          # MMR
                          mmr = MaximalMarginalRelevance(diversity=0.3)
                          
                          representation_models = [mmr, rep_model_chatgpt]
                          
                          topic_model = BERTopic(language="english", 
					                              top_n_words=100,
                                                  verbose=True, 
                                                  seed_topic_list=seed_topic_list,
                                                  representation_model=representation_models,
                                                  vectorizer_model=CountVectorizer(ngram_range=(1, 3) , stop_words="english")
                                                  ) 
                          
                          
                          topics, probs = topic_model.fit_transform(docs)
                          topic_model.save('my_model')

当我使用OpenAI作为表示模型时，尝试使用model.save()保存模型时会出现以下错误。
错误信息：
2023-12-11 16:26:00,118 - BERTopic - 警告：当你使用pickle来保存/加载一个BERTopic模型时，请确保你保存和加载模型的环境是完全相同的。BERTopic的版本、它的依赖项以及Python需要保持不变。
Traceback (most recent call last):
...
TypeError: 不能对'_thread.RLock'对象进行pickle
当尝试使用seed_topic_list进行引导主题建模(带或不带重表示模型)时，使用bertopic.fit_transform()会出现以下错误。
错误代码：
topics, probs = topic_model.fit_transform(docs)
文件 ".conda\envsbertopic2\lib\site-packages\bertopic_bertopic.py",第399行，在fit_transform函数中
y, embeddings = self._guided_topic_modeling(embeddings)
文件 ".conda\envs\bertopic2\libsite-packages\bertopic_bertopic.py",第3617行，在_guided_topic_modeling函数中
embeddings[indices] = np.average([embeddings[indices], seed_topic_embeddings[seed_topic]], weights=[3, 1])
文件 ".conda\envsbertopic2\lib\site-packages
umpy\lib\function_base.py",第511行，在average函数中
a = np.asanyarray(a)
ValueError: 将序列设置为数组元素。请求的数组在一维之后具有不均匀的形状。检测到的形状是(2,) + 不均匀部分。
感谢！

BERTopic

来源：https://github.com/MaartenGr/BERTopic/issues/1684

7条答案

按热度按时间

zrfyljdw1#

我相信pickle和OpenAI之间存在一个已知的问题(请查看相关问题),但在使用

topic_model.representation_model = None

保存之前，您可以通过以下方式轻松解决：

除此之外，使用safetensors或pytorch进行保存也可以正常工作。

赞(0）回复(0）举报 23天前

5gfr0r5j2#

Thanks @MaartenGr ,
any suggestion about the following error
When tried bertopic.fit_transform() with seed_topic_list for guided topic modelling (with or without rewpresentation model). getting the following error
Error:
topics, probs = topic_model.fit_transform(docs)
File ".conda\envs\bertopic2\lib\site-packages\bertopic_bertopic.py", line 399, in fit_transform
y, embeddings = self._guided_topic_modeling(embeddings)
File ".conda\envs\bertopic2\lib\site-packages\bertopic_bertopic.py", line 3617, in _guided_topic_modeling
embeddings[indices] = np.average([embeddings[indices], seed_topic_embeddings[seed_topic]], weights=[3, 1])
File ".conda\envs\bertopic2\lib\site-packages\numpy\lib\function_base.py", line 511, in average
a = np.asanyarray(a)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

赞(0）回复(0）举报 23天前

wyyhbhjk3#

我相信这是一个已知的问题，之前已经提出过。我建议在已关闭和未关闭的问题中搜索此问题。我认为你可以在那里找到解决方案。

赞(0）回复(0）举报 23天前

cdmah0mi4#

你好，@MaartenGr,

我正在尝试合并一些从上述模型生成的主题，但是得到的主题名称没有表示。我如何使用representation_models来调用topic_model.merge_topics(docs, topics_to_merge)?

谢谢

赞(0）回复(0）举报 23天前

ohtdti5x5#

使用.merge_topics应该为每个主题创建更新的表示。要访问它们，可以运行topioc_model.get_topic_info()或运行topic_model.get_topics(full=True)。

赞(0）回复(0）举报 23天前

c2e8gylq6#

感谢@MaartenGr,
我认为我的问题不太清楚。让我重新表述一下。通过使用representation_models(gpt-3.5-turbo),我可以以有意义的方式获得主题名称，如“12_Research team leading clinical trials with Biotech”，但在合并主题后，得到的是由基础模型生成的“12_trials_clinical_trial_clinical trials”，而不是由chatgpt生成的。
此外，在使用bertopic合并主题后，我是否可以通过使用representation_models重新生成主题名称？

赞(0）回复(0）举报 23天前

qc6wkl3g7#

当然，你可以使用.update_topics来实现这个功能。只要确保指定了representation_model参数，就可以使用你想要的任何表示形式。

赞(0）回复(0）举报 23天前

我来回答

BERTopic 错误作为一个认知智能模型的代表，OpenAI存在一些错误,

7条答案

相关问题

热门标签

最新问答

BERTopic 错误 作为一个认知智能模型的代表，OpenAI存在一些错误,

7条答案

相关问题

热门标签

最新问答

BERTopic 错误作为一个认知智能模型的代表，OpenAI存在一些错误,