当前环境:
The output of `python collect_env.py`
🐛 描述bug
命令:python benchmarks/benchmark_throughput.py --input-len 100 --output-len 100 --num-prompts 100 --model facebook/opt-125m -tp 2 --distributed-executor-backend ray
错误:
2024-07-28 22:30:36,078 INFO compiled_dag_node.py:1202 -- Tearing down compiled DAG
Exception ignored in: <function RayGPUExecutor.__del__ at 0x7ff2ee7048b0>
Traceback (most recent call last):
File "/data/youkaichao/vllm/vllm/executor/ray_gpu_executor.py", line 396, in __del__
self.forward_dag.teardown()
File "/data/youkaichao/miniconda/envs/vllm/lib/python3.9/site-packages/ray/dag/compiled_dag_node.py", line 1402, in teardown
monitor.teardown(wait=True)
File "/data/youkaichao/miniconda/envs/vllm/lib/python3.9/site-packages/ray/dag/compiled_dag_node.py", line 1204, in teardown
outer._dag_submitter.close()
File "/data/youkaichao/miniconda/envs/vllm/lib/python3.9/site-packages/ray/experimental/channel/common.py", line 383, in close
self._output_channel.close()
File "/data/youkaichao/miniconda/envs/vllm/lib/python3.9/site-packages/ray/experimental/channel/shared_memory_channel.py", line 629, in close
channel.close()
File "/data/youkaichao/miniconda/envs/vllm/lib/python3.9/site-packages/ray/experimental/channel/shared_memory_channel.py", line 512, in close
self._worker.core_worker.experimental_channel_set_error(self._writer_ref)
AttributeError: 'Worker' object has no attribute 'core_worker'
[1] 3100846 segmentation fault (core dumped) python benchmarks/benchmark_throughput.py --input-len 100 --output-len 100
cc @ruisearch42@rkooo567@stephanie-wang
3条答案
按热度按时间sr4lhrrt1#
这些应该只在使用ADAG时调用。
这些环境变量是否已设置?DISTRIBUTED_EXECUTOR_BACKEND=ray VLLM_USE_RAY_SPMD_WORKER=1 VLLM_USE_RAY_COMPILED_DAG=1
@youkaichao
uinbv5nw2#
同时,需要明确的是,我们计划尽快修复这个问题(如上文@ruisearch42所说,这应该只发生在环境变量中)。如果在没有环境变量的情况下发生,我们将立即尝试修复它,否则我们需要再花几天时间来解决这个问题。
chy5wohz3#
当我使用这些环境变量时,它会发生。所以现在它不是面向用户的。我只是在使用ray dag进行测试时遇到了这个问题。稍后修复应该没问题。