BERTopic 主题间距离图每次重新运行时都会发生变化,

oalqel3c 于 23天前发布在其他

关注(0)|答案(2)|浏览(17)

你好，

我对此非常感兴趣，想利用它来探索我的语料库。我不确定这是否是代码本身固有的功能，但每当我尝试重新运行时，它总是加载不同形式的主题间距离图。这意味着我无法复制它，这是不理想的。我已经附上了我使用的代码。谢谢！

df_clean = df.dropna(subset=['Policy_Content'])
umap = UMAP(n_neighbors=15,
 n_components=5,
 min_dist=0.0,
 metric='cosine',
 low_memory=False,
 random_state=123)
vectorizer_model = CountVectorizer(stop_words="english", min_df=2, ngram_range=(1, 2))
topic_model = BERTopic(umap_model=umap,vectorizer_model=vectorizer_model, verbose=True)
topics, probs = topic_model.fit_transform(df_clean['Policy_Content'])
[#227](https://github.com/MaartenGr/BERTopic/issues/227)  topics in total
topic_model.reduce_topics(df_clean['Policy_Content'], nr_topics=48)
topic_model.visualize_topics()

此致，
Yanith

BERTopic

来源：https://github.com/MaartenGr/BERTopic/issues/1822