paddle环境配置报错:[Hint: 'CUBLAS_STATUS_INVALID_VALUE'. An unsupported value or parameter was passed to the function (a negative vector size, for example). To correct: ensure that all the parameters being passed have valid values. ] (at ../paddle/phi/backends/gpu/gpu_context.cc:598)

vsnjm48y  于 5个月前  发布在  其他
关注(0)|答案(7)|浏览(45)

bug描述 Describe the Bug

历史issue( PaddlePaddle/PaddleDetection#5073 ) 有提及到此类问题,但是经检查发现问题不是同一类具体请看报错内容如下:

paddle>>> paddle.utils.run_check()
Running verify PaddlePaddle program ... 
I0724 16:12:45.847908 2871941 interpretercore.cc:237] New Executor is Running.
W0724 16:12:45.848285 2871941 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 12.0
W0724 16:12:45.849236 2871941 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/datadisk0/data/miniconda/py3.7/lib/python3.8/site-packages/paddle/utils/install_check.py", line 249, in run_check
    _run_static_single(use_cuda, use_xpu)
  File "/mnt/datadisk0/data/miniconda/py3.7/lib/python3.8/site-packages/paddle/utils/install_check.py", line 147, in _run_static_single
    exe.run(
  File "/mnt/datadisk0/data/miniconda/py3.7/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1392, in run
    res = self._run_impl(
  File "/mnt/datadisk0/data/miniconda/py3.7/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1618, in _run_impl
    ret = new_exe.run(
  File "/mnt/datadisk0/data/miniconda/py3.7/lib/python3.8/site-packages/paddle/fluid/executor.py", line 654, in run
    tensors = self._new_exe.run(
OSError: In user code:

    File "<stdin>", line 1, in <module>
      
    File "/mnt/datadisk0/data/miniconda/py3.7/lib/python3.8/site-packages/paddle/utils/install_check.py", line 249, in run_check
      _run_static_single(use_cuda, use_xpu)
    File "/mnt/datadisk0/data/miniconda/py3.7/lib/python3.8/site-packages/paddle/utils/install_check.py", line 133, in _run_static_single
      input, out, weight = _simple_network()
    File "/mnt/datadisk0/data/miniconda/py3.7/lib/python3.8/site-packages/paddle/utils/install_check.py", line 37, in _simple_network
      linear_out = paddle.nn.functional.linear(x=input, weight=weight, bias=bias)
    File "/mnt/datadisk0/data/miniconda/py3.7/lib/python3.8/site-packages/paddle/nn/functional/common.py", line 1860, in linear
      helper.append_op(
    File "/mnt/datadisk0/data/miniconda/py3.7/lib/python3.8/site-packages/paddle/fluid/layer_helper.py", line 45, in append_op
      return self.main_program.current_block().append_op(*args, **kwargs)
    File "/mnt/datadisk0/data/miniconda/py3.7/lib/python3.8/site-packages/paddle/fluid/framework.py", line 4013, in append_op
      op = Operator(
    File "/mnt/datadisk0/data/miniconda/py3.7/lib/python3.8/site-packages/paddle/fluid/framework.py", line 2781, in __init__
      for frame in traceback.extract_stack():

    ExternalError: CUBLAS error(7). 
      [Hint: 'CUBLAS_STATUS_INVALID_VALUE'.  An unsupported value or parameter was passed to the function (a negative vector size, for example). To correct: ensure that all the parameters being passed have valid values. ] (at ../paddle/phi/backends/gpu/gpu_context.cc:598)
      [operator < matmul_v2 > error]

如上报错内容“Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 12.0”版本一致,希望有遇到类似问题的小伙伴解答一下。

其他补充信息 Additional Supplementary Information

cuda:12.0
cudnn:8.9.2.26
python:3.8.17
paddlepaddle-gpu:2.5.0.post120
Ubuntu:20.04.6 LTS
gcc:(conda-forge gcc 12.1.0-17) 12.1.0

wrrgggsh

wrrgggsh1#

I got the error message below when running paddle-ocr with GPU. (w/o GPU, it's ok.)

OSError: (External) CUBLAS error(7). 
  [Hint: 'CUBLAS_STATUS_INVALID_VALUE'.  An unsupported value or parameter was passed to the function (a negative vector size, for example). To correct: ensure that all the parameters being passed have valid values. ] (at ../paddle/phi/backends/gpu/gpu_context.cc:599)

Runtime Env

  • OS: Ubuntu 20.04
  • NVIDIA-SMI: 525.125.06 Driver Version: 525.125.06
  • CUDA Version: 11.3
  • paddlepaddle-gpu: 2.5.2
d8tt03nd

d8tt03nd2#

提供一个解决思路。
有可能是本地安装的cuda和cudnn的问题。我的解决方案是注释掉从环境变量中删去本地已安装的cuda和cudnn包括PATH,CUDA_PATH和LD_LIBRARY_PATH,安装paddlepaddle的cudnn版本,官方文档中有给出,如下:

k0pti3hp

k0pti3hp3#

@MaddingRookie 你好,你本地环境是CUDA12.0版本的,这个最近刚编出来,可能适配的不是很好,建议重装一下2.5.1.post120试试。

dzjeubhm

dzjeubhm4#

@MaddingRookie 另外显卡是什么版本的,如果是计算能力太低的显卡,CUDA 12.0是有可能已经不支持了,需要用一下低版本的CUDA

6za6bjd0

6za6bjd05#

@MaddingRookie 你好,你本地环境是CUDA12.0版本的,这个最近刚编出来,可能适配的不是很好,建议重装一下2.5.1.post120试试。

你好,我目前把cuda降级成11.7了,相应的包也降级了,重新操作了一遍没问题了,应该是适配的问题

q43xntqr

q43xntqr6#

@MaddingRookie 另外显卡是什么版本的,如果是计算能力太低的显卡,CUDA 12.0是有可能已经不支持了,需要用一下低版本的CUDA

我之前的显卡驱动V100 64G Driver Version: 525.125.06

t2a7ltrp

t2a7ltrp7#

@MaddingRookie 另外显卡是什么版本的,如果是计算能力太低的显卡,CUDA 12.0是有可能已经不支持了,需要用一下低版本的CUDA

您好,我也遇到了这个问题, paddle.utils.run_check() 一样的错误。

W1011 22:26:24.336457 638519 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.0, Runtime API Version: 11.7 W1011 22:26:24.338999 638519 gpu_resources.cc:149] device: 0, cuDNN Version: 8.5.

显卡是A6000 Driver Version: 525.89.02

您知道怎么解决吗

相关问题