vllm 运行时错误在ROCm上

cig3rfwq 于 2个月前发布在其他

关注(0)|答案(7)|浏览(61)

示例命令：
python benchmark_throughput.py --model gpt2 --input-len 256 --output-len 256
输出：

INFO 01-24 14:52:52 llm_engine.py:72] Initializing an LLM engine with config: model='gpt2', tokenizer='gpt2', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, enforce_eager=False, seed=0)
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.1.1+cu121 with CUDA 1201 (you have 2.3.0.dev20240123+rocm5.7)
    Python  3.10.13 (you have 3.10.13)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
INFO 01-24 14:52:55 weight_utils.py:164] Using model weights format ['*.safetensors']
Traceback (most recent call last):
  File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/benchmark_throughput.py", line 318, in <module>
    main(args)
  File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/benchmark_throughput.py", line 205, in main
    elapsed_time = run_vllm(requests, args.model, args.tokenizer,
  File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/benchmark_throughput.py", line 76, in run_vllm
    llm = LLM(
  File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/.venv/lib/python3.10/site-packages/vllm-0.2.7+rocm573-py3.10-linux-x86_64.egg/vllm/entrypoints/llm.py", line 106, in __init__
    self.llm_engine = LLMEngine.from_engine_args(engine_args)
  File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/.venv/lib/python3.10/site-packages/vllm-0.2.7+rocm573-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 350, in from_engine_args
    engine = cls(*engine_configs,
  File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/.venv/lib/python3.10/site-packages/vllm-0.2.7+rocm573-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 112, in __init__
    self._init_cache()
  File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/.venv/lib/python3.10/site-packages/vllm-0.2.7+rocm573-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 303, in _init_cache
    num_blocks = self._run_workers(
  File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/.venv/lib/python3.10/site-packages/vllm-0.2.7+rocm573-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 977, in _run_workers
    driver_worker_output = getattr(self.driver_worker,
  File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/.venv/lib/python3.10/site-packages/vllm-0.2.7+rocm573-py3.10-linux-x86_64.egg/vllm/worker/worker.py", line 116, in profile_num_available_blocks
    free_gpu_memory, total_gpu_memory = torch.cuda.mem_get_info()
  File "/scratch/project_465000670/danish-foundation-models/scripts/lumi/eval/.venv/lib/python3.10/site-packages/torch/cuda/memory.py", line 655, in mem_get_info
    return torch.cuda.cudart().cudaMemGetInfo(device)
RuntimeError: HIP error: invalid argument
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

已安装的软件包：

accelerate                0.26.1
aiohttp                   3.9.1
aioprometheus             23.12.0
aiosignal                 1.3.1
annotated-types           0.6.0
anyio                     4.2.0
async-timeout             4.0.3
attrs                     23.2.0
bert-score                0.3.13
bitsandbytes              0.42.0
certifi                   2022.12.7
charset-normalizer        2.1.1
chex                      0.1.85
click                     8.1.7
cmake                     3.28.1
contourpy                 1.2.0
cycler                    0.12.1
datasets                  2.16.1
demjson3                  3.0.6
dill                      0.3.7
einops                    0.7.0
etils                     1.6.0
evaluate                  0.4.1
exceptiongroup            1.2.0
fastapi                   0.109.0
filelock                  3.9.0
flash-attn                2.0.4
flax                      0.8.0
fonttools                 4.47.2
frozenlist                1.4.1
fsspec                    2023.10.0
h11                       0.14.0
httptools                 0.6.1
huggingface-hub           0.20.3
idna                      3.4
importlib-resources       6.1.1
interegular               0.3.3
jax                       0.4.23
jaxlib                    0.4.23
Jinja2                    3.1.2
joblib                    1.3.2
jsonschema                4.21.1
jsonschema-specifications 2023.12.1
kiwisolver                1.4.5
Levenshtein               0.23.0
lm-format-enforcer        0.8.2
markdown-it-py            3.0.0
MarkupSafe                2.1.3
matplotlib                3.8.2
mdurl                     0.1.2
ml-dtypes                 0.3.2
mpmath                    1.2.1
msgpack                   1.0.7
multidict                 6.0.4
multiprocess              0.70.15
nest-asyncio              1.6.0
networkx                  3.0rc1
ninja                     1.11.1.1
nltk                      3.8.1
numpy                     1.26.3
openai                    0.28.1
opt-einsum                3.3.0
optax                     0.1.8
orbax-checkpoint          0.5.1
orjson                    3.9.12
packaging                 23.2
pandas                    1.5.3
Pillow                    9.3.0
pip                       23.3.2
protobuf                  3.20.3
psutil                    5.9.8
pyarrow                   14.0.2
pyarrow-hotfix            0.6
pydantic                  2.5.3
pydantic_core             2.14.6
Pygments                  2.17.2
pyinfer                   0.0.3
pyparsing                 3.1.1
python-dateutil           2.8.2
python-dotenv             0.21.1
pytorch-triton-rocm       2.2.0+dafe145982
pytz                      2023.3.post1
PyYAML                    6.0.1
quantile-python           1.1
rapidfuzz                 3.6.1
ray                       2.9.1
referencing               0.32.1
regex                     2023.12.25
requests                  2.31.0
responses                 0.18.0
rich                      13.7.0
rouge_score               0.1.2
rpds-py                   0.17.1
sacremoses                0.1.1
safetensors               0.4.1
scandeval                 9.2.0
scikit-learn              1.4.0
scipy                     1.12.0
sentencepiece             0.1.99
seqeval                   1.2.2
setuptools                65.5.0
six                       1.16.0
sniffio                   1.3.0
starlette                 0.35.1
sympy                     1.11.1
tabulate                  0.9.0
tensorstore               0.1.52
termcolor                 2.4.0
threadpoolctl             3.2.0
tiktoken                  0.5.2
tokenizers                0.15.1
toolz                     0.12.1
torch                     2.3.0.dev20240123+rocm5.7
torchaudio                2.2.0.dev20240123+rocm5.7
torchvision               0.18.0.dev20240123+rocm5.7
tqdm                      4.66.1
transformers              4.37.0
typing_extensions         4.9.0
urllib3                   1.26.13
uvicorn                   0.27.0
uvloop                    0.19.0
vllm                      0.2.7+rocm573
watchfiles                0.21.0
websockets                12.0
xformers                  0.0.23
xxhash                    3.4.1
yarl                      1.9.4
zipp                      3.17.0

此程序正在 rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1 容器上运行，该容器位于具有 MI250X GPU 的节点上。

vllm

来源：https://github.com/vllm-project/vllm/issues/2580

7条答案

按热度按时间

a0x5cqrl1#

我也遇到了这个问题，我手动修改了free_gpu_memory和total_gpu_memory。

赞(0）回复(0）举报 2个月前

aiqt4smr2#

这看起来与这个问题类似。你能看看这些或者它们的组合是否有效吗？

export PYTORCH_ROCM_ARCH="gfx1031"
export HSA_OVERRIDE_GFX_VERSION=10.3.1
export HIP_VISIBLE_DEVICES=0
export ROCM_PATH=/opt/rocm

赞(0）回复(0）举报 2个月前

mwg9r5ms3#

进一步研究后，HSA_OVERRIDE_GFX_VERSION确实会影响发生的事情。鉴于MI250X基于gfx90a架构，我尝试了HSA_OVERRIDE_GFX_VERSION=9.0.0,至少又出现了一个错误：

HSA_OVERRIDE_GFX_VERSION=9.0.0 python benchmark_throughput.py --model EleutherAI/pythia-70m --input-len 256 --output-len 256 --
num-prompts 100 --backend vllm
Namespace(backend='vllm', dataset=None, input_len=256, output_len=256, model='EleutherAI/pythia-70m', tokenizer='EleutherAI/pythia-70m', quantization=None, tensor_parallel_size=1, n=1, use_beam_search=False, num_prompts=100, seed=0, hf_max_batch_size=None, trust_remote_code=False, max_model_len=None, dtype='auto', enforce_eager=False)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 02-08 16:24:00 config.py:393] Disabled the custom all-reduce kernel because it is not supported on AMD GPUs.
INFO 02-08 16:24:00 llm_engine.py:73] Initializing an LLM engine with config: model='EleutherAI/pythia-70m', tokenizer='EleutherAI/pythia-70m', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.1.1+cu121 with CUDA 1201 (you have 2.3.0.dev20240207+rocm5.7)
    Python  3.10.13 (you have 3.10.13)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
Memory access fault by GPU node-4 (Agent handle: 0x907d170) on address 0x15236fcae000. Reason: Unknown.
Aborted

或者，使用HSA_OVERRIDE_GFX_VERSION=9.0.2似乎可以更进一步。

HSA_OVERRIDE_GFX_VERSION=9.0.2 python benchmark_throughput.py --model EleutherAI/pythia-70m --input-len 256 --output-len 256 --num-prompts 100 --backend vllm
Namespace(backend='vllm', dataset=None, input_len=256, output_len=256, model='EleutherAI/pythia-70m', tokenizer='EleutherAI/pythia-70m', quantization=None, tensor_parallel_size=1, n=1, use_beam_search=False, num_prompts=100, seed=0, hf_max_batch_size=None, trust_remote_code=False, max_model_len=None, dtype='auto', enforce_eager=False)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 02-08 16:25:49 config.py:393] Disabled the custom all-reduce kernel because it is not supported on AMD GPUs.
INFO 02-08 16:25:49 llm_engine.py:73] Initializing an LLM engine with config: model='EleutherAI/pythia-70m', tokenizer='EleutherAI/pythia-70m', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/.venv/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 75, in wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/.venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2128, in all_reduce
[rank0]:     work = group.allreduce([tensor], opts)
[rank0]: torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:2006, unhandled cuda error, NCCL version 2.17.1
[rank0]: ncclUnhandledCudaError: Call to CUDA function failed.
[rank0]: Last error:
[rank0]: Cuda failure 'invalid kernel file'

[rank0]: During handling of the above exception, another exception occurred:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/benchmark_throughput.py", line 318, in <module>
[rank0]:     main(args)
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/benchmark_throughput.py", line 205, in main
[rank0]:     elapsed_time = run_vllm(requests, args.model, args.tokenizer,
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/benchmark_throughput.py", line 76, in run_vllm
[rank0]:     llm = LLM(
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/.venv/lib/python3.10/site-packages/vllm-0.3.0+rocm573-py3.10-linux-x86_64.egg/vllm/entrypoints/llm.py", line 109, in __init__
[rank0]:     self.llm_engine = LLMEngine.from_engine_args(engine_args)
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/.venv/lib/python3.10/site-packages/vllm-0.3.0+rocm573-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 361, in from_engine_args
[rank0]:     engine = cls(*engine_configs,
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/.venv/lib/python3.10/site-packages/vllm-0.3.0+rocm573-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 114, in __init__
[rank0]:     self._init_workers()
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/.venv/lib/python3.10/site-packages/vllm-0.3.0+rocm573-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 153, in _init_workers
[rank0]:     self._run_workers("init_model")
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/.venv/lib/python3.10/site-packages/vllm-0.3.0+rocm573-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 989, in _run_workers
[rank0]:     driver_worker_output = getattr(self.driver_worker,
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/.venv/lib/python3.10/site-packages/vllm-0.3.0+rocm573-py3.10-linux-x86_64.egg/vllm/worker/worker.py", line 90, in init_model
[rank0]:     init_distributed_environment(self.parallel_config, self.rank,
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/.venv/lib/python3.10/site-packages/vllm-0.3.0+rocm573-py3.10-linux-x86_64.egg/vllm/worker/worker.py", line 259, in init_distributed_environment
[rank0]:     torch.distributed.all_reduce(torch.zeros(1).cuda())
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/.venv/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 77, in wrapper
[rank0]:     msg_dict = _get_msg_dict(func.__name__, *args, **kwargs)
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/.venv/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 50, in _get_msg_dict
[rank0]:     "args": f"{args}, {kwargs}",
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/.venv/lib/python3.10/site-packages/torch/_tensor.py", line 463, in __repr__
[rank0]:     return torch._tensor_str._str(self, tensor_contents=tensor_contents)
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/.venv/lib/python3.10/site-packages/torch/_tensor_str.py", line 677, in _str
[rank0]:     return _str_intern(self, tensor_contents=tensor_contents)
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/.venv/lib/python3.10/site-packages/torch/_tensor_str.py", line 597, in _str_intern
[rank0]:     tensor_str = _tensor_str(self, indent)
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/.venv/lib/python3.10/site-packages/torch/_tensor_str.py", line 349, in _tensor_str
[rank0]:     formatter = _Formatter(get_summarized_data(self) if summarize else self)
[rank0]:   File "/scratch/project_465000670/danish-foundation-models/evaluation/.venv/lib/python3.10/site-packages/torch/_tensor_str.py", line 138, in __init__
[rank0]:     tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0)
[rank0]: RuntimeError: HIP error: invalid device function
[rank0]: HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank0]: For debugging consider passing AMD_SERIALIZE_KERNEL=3.
[rank0]: Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

赞(0）回复(0）举报 2个月前

mkshixfv4#

进展！如果你设置AMD_SERIALIZE_KERNEL=3会发生什么？也许我们会得到一个更有信息性的错误。

赞(0）回复(0）举报 2个月前

qxsslcnc5#

@rlrs,这个问题现在解决了吗？

赞(0）回复(0）举报 2个月前

ggazkfy86#

这个问题现在解决了吗？

赞(0）回复(0）举报 2个月前

pgpifvop7#

对不起，我已经几个月没有尝试这个了，所以我不知道它是否已经修复。我可能在几周后有机会再试一次，但在那之前不会。