我尝试运行一些第三方Pytorch代码,看到以下错误:
File "D:\Testing\OFFLINE\PSFM\particle-sfm\motion_seg\core\network\traj_oa_depth.py", line 48, in extract_feature
output_feat = self.transformer_model(input_traj, input_traj, \
File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\transformer.py", line 146, in forward
output = self.decoder(tgt, memory, tgt_mask=tgt_mask, memory_mask=memory_mask,
File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\transformer.py", line 369, in forward
output = mod(output, memory, tgt_mask=tgt_mask,
File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\transformer.py", line 717, in forward
x = self.norm2(x + self._mha_block(x, memory, memory_mask, memory_key_padding_mask, memory_is_causal))
File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\transformer.py", line 735, in _mha_block
x = self.multihead_attn(x, mem, mem,
File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\activation.py", line 1205, in forward
attn_output, attn_output_weights = F.multi_head_attention_forward(
File "C:\Users\B\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\functional.py", line 5373, in multi_head_attention_forward
attn_output = scaled_dot_product_attention(q, k, v, attn_mask, dropout_p, is_causal)
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
字符串
值得注意的一行是RuntimeError: CUDA error: invalid configuration argument
Google抛出了几个similar issues,它似乎与发送到GPU的块有关,或者使用较旧的Pytorch版本。我的版本是2.0.1+cu117'
我的问题是,我如何在我没有写的代码中调试它?我应该搜索什么来找到设置GPU块的Python代码行?
导致错误的函数如下:
https://github.com/bytedance/particle-sfm/blob/main/motion_seg/core/network/traj_oa_depth.py的
1条答案
按热度按时间ee7vknir1#
这可以通过添加以下内容来解决:
字符串
Python代码。
作为here,它使用Pytorch衰减方法,并跳过导致此错误的自定义CUDA内核