Pytorch:CUDA错误:无效的配置参数

edqdpe6u  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(147)

我尝试运行一些第三方Pytorch代码,看到以下错误:

File "D:\Testing\OFFLINE\PSFM\particle-sfm\motion_seg\core\network\traj_oa_depth.py", line 48, in extract_feature
    output_feat = self.transformer_model(input_traj, input_traj, \
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\transformer.py", line 146, in forward
    output = self.decoder(tgt, memory, tgt_mask=tgt_mask, memory_mask=memory_mask,
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\transformer.py", line 369, in forward
    output = mod(output, memory, tgt_mask=tgt_mask,
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\transformer.py", line 717, in forward
    x = self.norm2(x + self._mha_block(x, memory, memory_mask, memory_key_padding_mask, memory_is_causal))
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\transformer.py", line 735, in _mha_block
    x = self.multihead_attn(x, mem, mem,
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\activation.py", line 1205, in forward
    attn_output, attn_output_weights = F.multi_head_attention_forward(
  File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\functional.py", line 5373, in multi_head_attention_forward
    attn_output = scaled_dot_product_attention(q, k, v, attn_mask, dropout_p, is_causal)
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

字符串
值得注意的一行是RuntimeError: CUDA error: invalid configuration argument
Google抛出了几个similar issues,它似乎与发送到GPU的块有关,或者使用较旧的Pytorch版本。我的版本是2.0.1+cu117'
我的问题是,我如何在我没有写的代码中调试它?我应该搜索什么来找到设置GPU块的Python代码行?
导致错误的函数如下:
https://github.com/bytedance/particle-sfm/blob/main/motion_seg/core/network/traj_oa_depth.py

ee7vknir

ee7vknir1#

这可以通过添加以下内容来解决:

torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)
torch.backends.cuda.enable_math_sdp(True)

字符串
Python代码。
作为here,它使用Pytorch衰减方法,并跳过导致此错误的自定义CUDA内核

相关问题