BERTopic ValueError: 使用序列设置数组元素,尝试引导主题建模,

eivgtgni  于 6个月前  发布在  其他
关注(0)|答案(6)|浏览(66)

$x_1a_0b_1^x$

当运行以下代码(直接从文档中复制)时:

$x_1a_1b_1^x$

Python 3.11

wkyowqbh

wkyowqbh1#

请尝试在全新的环境中使用Python 3.10和最新版本的BERTopic(v0.15)。此外,检查可能的重复问题,例如this也是值得的。

lsmd5eda

lsmd5eda2#

我遇到了相同的错误。在#1309中安装了提到的numba版本后,我的BERTopic模型产生了其他错误。因此,我不得不创建一个新的环境并重新安装BERTopic及其依赖项。有没有解决这个问题的方法?

xvw2m8pv

xvw2m8pv3#

将numba设置为0.56.4或更早版本通常可以解决这个问题。安装之前的numba版本时,还产生了哪些其他错误?

g6ll5ycj

g6ll5ycj4#

因为我想使用带有GPU的BERTopic,所以我安装了cuml-cu11。然而,当我尝试安装提到的numba版本时,它导致了以下不一致性。我认为这就是我遇到的错误的原因。

cudf-cu11 23.6.1 requires numba>=0.57, but you have numba 0.56.4 which is incompatible.
cuml-cu11 23.6.0 requires numba>=0.57, but you have numba 0.56.4 which is incompatible.
dask-cuda 23.6.0 requires numba>=0.57, but you have numba 0.56.4 which is incompatible.
raft-dask-cu11 23.6.2 requires numba>=0.57, but you have numba 0.56.4 which is incompatible.
rmm-cu11 23.6.0 requires numba>=0.57, but you have numba 0.56.4 which is incompatible.

在训练BERTopic时使用CUML安装numba 0.56.4后,我得到了以下错误:

Traceback (most recent call last):
  File "/home/alireza.imani/software/miniconda3/envs/topic/lib/python3.10/site-packages/cuml/internals/array.py", line 225, in __init__
    self._array_interface = data.__cuda_array_interface__
AttributeError: 'list' object has no attribute '__cuda_array_interface__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/alireza.imani/software/miniconda3/envs/topic/lib/python3.10/site-packages/cuml/internals/array.py", line 231, in __init__
    self._array_interface = data.__array_interface__
AttributeError: 'list' object has no attribute '__array_interface__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/alireza.imani/software/miniconda3/envs/topic/lib/python3.10/site-packages/cuml/internals/array.py", line 237, in __init__
    dtype = data.dtype
AttributeError: 'list' object has no attribute 'dtype'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/alireza.imani/projects/topic-modelling/train_bertopic.py", line 336, in <module>
    topic_model = topic_model.fit(docs_chunks, 
  File "/home/alireza.imani/software/miniconda3/envs/topic/lib/python3.10/site-packages/bertopic/_bertopic.py", line 306, in fit
    self.fit_transform(documents=documents, embeddings=embeddings, y=y, images=images)
  File "/home/alireza.imani/software/miniconda3/envs/topic/lib/python3.10/site-packages/bertopic/_bertopic.py", line 399, in fit_transform
    umap_embeddings = self._reduce_dimensionality(embeddings, y)
  File "/home/alireza.imani/software/miniconda3/envs/topic/lib/python3.10/site-packages/bertopic/_bertopic.py", line 3196, in _reduce_dimensionality
    self.umap_model.fit(embeddings, y=y)
  File "/home/alireza.imani/software/miniconda3/envs/topic/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
    ret = func(*args, **kwargs)
  File "/home/alireza.imani/software/miniconda3/envs/topic/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
    return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
  File "/home/alireza.imani/software/miniconda3/envs/topic/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
    return func(*args, **kwargs)
  File "base.pyx", line 669, in cuml.internals.base.UniversalBase.dispatch_func
  File "umap.pyx", line 592, in cuml.manifold.umap.UMAP.fit
  File "/home/alireza.imani/software/miniconda3/envs/topic/lib/python3.10/site-packages/nvtx/nvtx.py", line 101, in inner
    result = func(*args, **kwargs)
  File "/home/alireza.imani/software/miniconda3/envs/topic/lib/python3.10/site-packages/cuml/internals/input_utils.py", line 369, in input_to_cuml_array
    arr = CumlArray.from_input(
  File "/home/alireza.imani/software/miniconda3/envs/topic/lib/python3.10/site-packages/cuml/internals/memory_utils.py", line 87, in cupy_rmm_wrapper
    return func(*args, **kwargs)
  File "/home/alireza.imani/software/miniconda3/envs/topic/lib/python3.10/site-packages/nvtx/nvtx.py", line 101, in inner
    result = func(*args, **kwargs)
  File "/home/alireza.imani/software/miniconda3/envs/topic/lib/python3.10/site-packages/cuml/internals/array.py", line 1075, in from_input
    arr = cls(X, index=index, order=requested_order, validate=False)
  File "/home/alireza.imani/software/miniconda3/envs/topic/lib/python3.10/site-packages/cuml/internals/memory_utils.py", line 87, in cupy_rmm_wrapper
    return func(*args, **kwargs)
  File "/home/alireza.imani/software/miniconda3/envs/topic/lib/python3.10/site-packages/nvtx/nvtx.py", line 101, in inner
    result = func(*args, **kwargs)
  File "/home/alireza.imani/software/miniconda3/envs/topic/lib/python3.10/site-packages/cuml/internals/array.py", line 239, in __init__
    raise ValueError(
ValueError: Must specify dtype when data is passed as a <class 'list'>
p5cysglq

p5cysglq5#

我成功地解决了这个问题,而没有降低numba版本。我创建了一个全新的环境,并安装了bertopic、cuml以及其他GPU训练所需的库。然后,我将_guided_topic_modeling()方法中的np.average行更改为以下行:

embeddings[indices] = np.average([embeddings[indices], np.tile([seed_topic_embeddings[seed_topic]], (len(indices),1))], weights=[3,1],axis=0)

关于之前在CUML的UMAP拟合方法中提出的ValueError: Must specify dtype when data is passed as a <class 'list'>问题,解决方案是将_guided_topic_modeling()方法的返回语句更改为以下行:

return np.array(y), embeddings
hm2xizp9

hm2xizp96#

很高兴听到你成功解决了这个问题。我相信你提到的第二个修复已经推送到主分支,但似乎还没有正式发布。

我记得之前在哪里看到过你的解决方案,我认为我还没有实现它,因为它需要对嵌入进行平铺,这往往会显著增加内存使用量。然而,遗憾的是,加权平均值与numba不再兼容。

相关问题