vllm [用法]:部署Llama3.1 405B-指令-FP8与H800 * 8不工作

ct2axkht  于 2个月前  发布在  其他
关注(0)|答案(3)|浏览(26)

当前环境

  • 8 * H800
  • CUDA 11.8
  • vllm 0.5.3post1
  • python 3.9

我正在使用vllm部署llama3 405B-instruct-FP8,但是在部署时,它报告了一个错误:

INFO 07-24 22:52:39 multiproc_worker_utils.py:136] Terminating local vLLM worker processes
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: Expected a.dtype() == torch::kInt8 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.), Traceback (most recent call last):
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: Expected a.dtype() == torch::kInt8 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.), Traceback (most recent call last):
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: Expected a.dtype() == torch::kInt8 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.), Traceback (most recent call last):
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: Expected a.dtype() == torch::kInt8 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.), Traceback (most recent call last):
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     output = executor(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: Expected a.dtype() == torch::kInt8 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.), Traceback (most recent call last):
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: Expected a.dtype() == torch::kInt8 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.), Traceback (most recent call last):
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     output = executor(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     output = executor(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     output = executor(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     output = executor(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     output = executor(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     self.model_runner.profile_run()
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     self.model_runner.profile_run()
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     self.model_runner.profile_run()
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     self.model_runner.profile_run()
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     self.model_runner.profile_run()
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     self.model_runner.profile_run()
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks: Expected a.dtype() == torch::kInt8 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.), Traceback (most recent call last):
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 896, in profile_run
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 896, in profile_run
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     output = executor(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 896, in profile_run
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 896, in profile_run
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 896, in profile_run
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 896, in profile_run
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1314, in execute_model
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1314, in execute_model
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     self.model_runner.profile_run()
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1314, in execute_model
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1314, in execute_model
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1314, in execute_model
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1314, in execute_model
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 896, in profile_run
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     self.execute_model(model_input, kv_caches, intermediate_tensors)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 422, in forward
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     model_output = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 422, in forward
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return func(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 422, in forward
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 422, in forward
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 422, in forward
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     model_output = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 422, in forward
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1314, in execute_model
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     model_output = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     model_output = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     model_output = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     model_output = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_or_intermediate_states = model_executable(
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 322, in forward
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_states, residual = layer(
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 322, in forward
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 322, in forward
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 322, in forward
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 322, in forward
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_states, residual = layer(
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 322, in forward
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 422, in forward
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_states, residual = layer(
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_states, residual = layer(
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_states, residual = layer(
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_states, residual = layer(
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     model_output = self.model(input_ids, positions, kv_caches,
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 255, in forward
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_states = self.mlp(hidden_states)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 255, in forward
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 255, in forward
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 255, in forward
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 255, in forward
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_states = self.mlp(hidden_states)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 255, in forward
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 322, in forward
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_states = self.mlp(hidden_states)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_states = self.mlp(hidden_states)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_states = self.mlp(hidden_states)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_states = self.mlp(hidden_states)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_states, residual = layer(
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 87, in forward
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     gate_up, _ = self.gate_up_proj(x)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 87, in forward
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 87, in forward
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 87, in forward
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 87, in forward
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     gate_up, _ = self.gate_up_proj(x)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 87, in forward
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 255, in forward
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     gate_up, _ = self.gate_up_proj(x)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     gate_up, _ = self.gate_up_proj(x)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     gate_up, _ = self.gate_up_proj(x)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     gate_up, _ = self.gate_up_proj(x)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     hidden_states = self.mlp(hidden_states)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/linear.py", line 330, in forward
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     output_parallel = self.quant_method.apply(self, input_, bias)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/linear.py", line 330, in forward
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/linear.py", line 330, in forward
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/linear.py", line 330, in forward
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/linear.py", line 330, in forward
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/fbgemm_fp8.py", line 175, in apply
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/linear.py", line 330, in forward
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     output_parallel = self.quant_method.apply(self, input_, bias)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 87, in forward
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     output_parallel = self.quant_method.apply(self, input_, bias)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     output_parallel = self.quant_method.apply(self, input_, bias)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     output_parallel = self.quant_method.apply(self, input_, bias)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return apply_fp8_linear(input=x,
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     output_parallel = self.quant_method.apply(self, input_, bias)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/fbgemm_fp8.py", line 175, in apply
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     gate_up, _ = self.gate_up_proj(x)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/fbgemm_fp8.py", line 175, in apply
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/fbgemm_fp8.py", line 175, in apply
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/fbgemm_fp8.py", line 175, in apply
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/utils/w8a8_utils.py", line 126, in apply_fp8_linear
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/fbgemm_fp8.py", line 175, in apply
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return apply_fp8_linear(input=x,
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return apply_fp8_linear(input=x,
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return apply_fp8_linear(input=x,
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return apply_fp8_linear(input=x,
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return ops.cutlass_scaled_mm(qinput,
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return apply_fp8_linear(input=x,
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/utils/w8a8_utils.py", line 126, in apply_fp8_linear
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self._call_impl(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/utils/w8a8_utils.py", line 126, in apply_fp8_linear
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/utils/w8a8_utils.py", line 126, in apply_fp8_linear
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/utils/w8a8_utils.py", line 126, in apply_fp8_linear
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 34, in wrapper
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/utils/w8a8_utils.py", line 126, in apply_fp8_linear
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return ops.cutlass_scaled_mm(qinput,
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return ops.cutlass_scaled_mm(qinput,
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return ops.cutlass_scaled_mm(qinput,
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return ops.cutlass_scaled_mm(qinput,
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return fn(*args, **kwargs)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return ops.cutlass_scaled_mm(qinput,
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 34, in wrapper
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return forward_call(*args, **kwargs)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 34, in wrapper
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 34, in wrapper
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 34, in wrapper
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 251, in cutlass_scaled_mm
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 34, in wrapper
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return fn(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/linear.py", line 330, in forward
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return fn(*args, **kwargs)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return fn(*args, **kwargs)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return fn(*args, **kwargs)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     torch.ops._C.cutlass_scaled_mm(out, a, b, scale_a, scale_b, bias)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return fn(*args, **kwargs)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 251, in cutlass_scaled_mm
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     output_parallel = self.quant_method.apply(self, input_, bias)
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 251, in cutlass_scaled_mm
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 251, in cutlass_scaled_mm
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 251, in cutlass_scaled_mm
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/_ops.py", line 854, in __call__
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 251, in cutlass_scaled_mm
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     torch.ops._C.cutlass_scaled_mm(out, a, b, scale_a, scale_b, bias)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/fbgemm_fp8.py", line 175, in apply
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     torch.ops._C.cutlass_scaled_mm(out, a, b, scale_a, scale_b, bias)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     torch.ops._C.cutlass_scaled_mm(out, a, b, scale_a, scale_b, bias)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     torch.ops._C.cutlass_scaled_mm(out, a, b, scale_a, scale_b, bias)
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self_._op(*args, **(kwargs or {}))
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     torch.ops._C.cutlass_scaled_mm(out, a, b, scale_a, scale_b, bias)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/_ops.py", line 854, in __call__
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return apply_fp8_linear(input=x,
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/_ops.py", line 854, in __call__
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/_ops.py", line 854, in __call__
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/_ops.py", line 854, in __call__
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] RuntimeError: Expected a.dtype() == torch::kInt8 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/_ops.py", line 854, in __call__
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self_._op(*args, **(kwargs or {}))
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/model_executor/layers/quantization/utils/w8a8_utils.py", line 126, in apply_fp8_linear
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self_._op(*args, **(kwargs or {}))
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self_._op(*args, **(kwargs or {}))
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self_._op(*args, **(kwargs or {}))
(VllmWorkerProcess pid=28011) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self_._op(*args, **(kwargs or {}))
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] RuntimeError: Expected a.dtype() == torch::kInt8 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return ops.cutlass_scaled_mm(qinput,
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] RuntimeError: Expected a.dtype() == torch::kInt8 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] RuntimeError: Expected a.dtype() == torch::kInt8 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] RuntimeError: Expected a.dtype() == torch::kInt8 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] RuntimeError: Expected a.dtype() == torch::kInt8 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)
(VllmWorkerProcess pid=28013) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 34, in wrapper
(VllmWorkerProcess pid=28010) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=28016) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=28015) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=28012) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return fn(*args, **kwargs)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/vllm/_custom_ops.py", line 251, in cutlass_scaled_mm
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     torch.ops._C.cutlass_scaled_mm(out, a, b, scale_a, scale_b, bias)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.9/site-packages/torch/_ops.py", line 854, in __call__
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]     return self_._op(*args, **(kwargs or {}))
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226] RuntimeError: Expected a.dtype() == torch::kInt8 to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)
(VllmWorkerProcess pid=28014) ERROR 07-24 22:52:39 multiproc_worker_utils.py:226]
(VllmWorkerProcess pid=28011) INFO 07-24 22:52:39 multiproc_worker_utils.py:237] Worker exiting
(VllmWorkerProcess pid=28015) INFO 07-24 22:52:39 multiproc_worker_utils.py:237] Worker exiting
(VllmWorkerProcess pid=28016) INFO 07-24 22:52:39 multiproc_worker_utils.py:237] Worker exiting
(VllmWorkerProcess pid=28010) INFO 07-24 22:52:39 multiproc_worker_utils.py:237] Worker exiting
(VllmWorkerProcess pid=28012) INFO 07-24 22:52:39 multiproc_worker_utils.py:237] Worker exiting
(VllmWorkerProcess pid=28014) INFO 07-24 22:52:39 multiproc_worker_utils.py:237] Worker exiting
(VllmWorkerProcess pid=28013) INFO 07-24 22:52:39 multiproc_worker_utils.py:237] Worker exiting
[rank0]:[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
/usr/local/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown

我的config.json是:

{
    "model":"origin_model",
    "disable_log_requests": "true",
    "gpu_memory_utilization": 0.9,
    "tensor_parallel_size": 8,
    "trust_remote_code": true,
    "enable_chunked_prefill": false,
    "enable_prefix_caching": false,
    "max_model_len": 4096,
    "quantization": "fbgemm_fp8",
    "dtype": "bfloat16"
}

我应该怎么办?

fumotvh3

fumotvh31#

H800是否不支持fbgemm_fp8?

tyky79it

tyky79it2#

请阅读 #6689 并在此提出,如果尚未讨论的话。

mum43rcc

mum43rcc3#

CUDA版本应为12.X而非11.8。

相关问题