vllm [Bug]: dag teardown错误AttributeError: 'Worker'对象没有属性'core_worker'

bqjvbblv  于 2个月前  发布在  其他
关注(0)|答案(3)|浏览(41)

当前环境:

The output of `python collect_env.py`

🐛 描述bug
命令:
python benchmarks/benchmark_throughput.py --input-len 100 --output-len 100 --num-prompts 100 --model facebook/opt-125m -tp 2 --distributed-executor-backend ray
错误:

2024-07-28 22:30:36,078 INFO compiled_dag_node.py:1202 -- Tearing down compiled DAG
Exception ignored in: <function RayGPUExecutor.__del__ at 0x7ff2ee7048b0>
Traceback (most recent call last):
  File "/data/youkaichao/vllm/vllm/executor/ray_gpu_executor.py", line 396, in __del__
    self.forward_dag.teardown()
  File "/data/youkaichao/miniconda/envs/vllm/lib/python3.9/site-packages/ray/dag/compiled_dag_node.py", line 1402, in teardown
    monitor.teardown(wait=True)
  File "/data/youkaichao/miniconda/envs/vllm/lib/python3.9/site-packages/ray/dag/compiled_dag_node.py", line 1204, in teardown
    outer._dag_submitter.close()
  File "/data/youkaichao/miniconda/envs/vllm/lib/python3.9/site-packages/ray/experimental/channel/common.py", line 383, in close
    self._output_channel.close()
  File "/data/youkaichao/miniconda/envs/vllm/lib/python3.9/site-packages/ray/experimental/channel/shared_memory_channel.py", line 629, in close
    channel.close()
  File "/data/youkaichao/miniconda/envs/vllm/lib/python3.9/site-packages/ray/experimental/channel/shared_memory_channel.py", line 512, in close
    self._worker.core_worker.experimental_channel_set_error(self._writer_ref)
AttributeError: 'Worker' object has no attribute 'core_worker'
[1]    3100846 segmentation fault (core dumped)  python benchmarks/benchmark_throughput.py --input-len 100 --output-len 100

cc @ruisearch42@rkooo567@stephanie-wang

sr4lhrrt

sr4lhrrt1#

这些应该只在使用ADAG时调用。
这些环境变量是否已设置?DISTRIBUTED_EXECUTOR_BACKEND=ray VLLM_USE_RAY_SPMD_WORKER=1 VLLM_USE_RAY_COMPILED_DAG=1
@youkaichao

uinbv5nw

uinbv5nw2#

同时,需要明确的是,我们计划尽快修复这个问题(如上文@ruisearch42所说,这应该只发生在环境变量中)。如果在没有环境变量的情况下发生,我们将立即尝试修复它,否则我们需要再花几天时间来解决这个问题。

chy5wohz

chy5wohz3#

当我使用这些环境变量时,它会发生。所以现在它不是面向用户的。我只是在使用ray dag进行测试时遇到了这个问题。稍后修复应该没问题。

相关问题