vllm [Bug]:

332nm8kg 于 2个月前发布在其他

关注(0)|答案(2)|浏览(24)

当前环境信息如下：

PyTorch版本：2.2.1+cu121
是否为调试构建：False
用于构建PyTorch的CUDA版本：12.1
用于构建PyTorch的ROCM版本：N/A
操作系统：Ubuntu 22.04.4 LTS(x86_64)
GCC版本：(Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang版本：无法收集
CMake版本：3.30.1
Libc版本：glibc-2.35
Python版本：3.8.0 | packaged by conda-forge (default, Nov 22 2019, 19:11:38) [GCC 7.3.0] (64-bit runtime)
Python平台：Linux-6.5.0-45-generic-x86_64-with-glibc2.10
是否可用CUDA:是
CUDA运行时版本：12.1.66
CUDA模块加载设置：LAZY
GPU模型和配置：GPU 0:NVIDIA GeForce GTX 1050
Nvidia驱动版本：535.183.01
cuDNN版本：无法收集
HIP运行时版本：N/A
MIOpen运行时版本：N/A
XNNPACK是否可用：是
这个错误信息表示在加载模型权重时出现了问题。具体来说，是在执行self.model_runner.profile_run()时出现了问题。这可能是由于模型文件损坏或者不兼容导致的。建议检查模型文件是否完整且与当前环境兼容。
The output of the code is not provided in the question. However, I can help you with a general approach to handling CUDA kernel errors in PyTorch.

When you encounter a CUDA kernel error in PyTorch, you can use the torch.cuda.runtime.get_last_error() function to get more information about the error. This function returns a string containing details about the error, such as the error code and the location of the error in the source code.

Here's an example of how you can use this function to handle CUDA kernel errors:

import torch

# Define your model and other necessary variables here

try:
    # Run your model on GPU here
    output_parallel = self.linear_method.apply_weights(self, input*, bias)
except RuntimeError as e:
    # Get the last error from CUDA
    cuda_error = torch.cuda.runtime.get_last_error()
    print("CUDA error occurred:", cuda_error)

By using this approach, you can get more detailed information about the error and potentially find a solution or workaround for it.
代码中存在一个内存泄漏问题，导致程序无法正常运行。具体来说，在第16行的_PyObject_MakeTpCall函数调用中，传入了一个无效的指针0x73b1e169813b。为了解决这个问题，需要检查代码中是否存在未正确释放内存的情况，并修复相关问题。

vllm

来源：https://github.com/vllm-project/vllm/issues/7072