Multi-node serving with vLLM - Problems with Ray

vecaoik1  于 2个月前  发布在  其他
关注(0)|答案(9)|浏览(63)

我正在尝试使用ray和vLLM运行一个分布式(多节点)推理服务器,但我一直收到以下ValueError:
Ray does not allocate any GPUs on the driver node. Consider adjusting the Ray placement group or running the driver on a GPU node.
我不知道如何确切地解决这个问题。我怀疑问题出在这个脚本 https://github.com/vllm-project/vllm/blob/main/vllm/engine/ray_utils.py 上,尤其是当传递了一个 ray_address 时。在ray.init()阶段是否有特定的ray_address参数传递?
更具体地说,这个错误似乎是由于https://github.com/vllm-project/vllm/blob/main/vllm/engine/llm_engine.py的第182行的driver_dummy_worker引起的。我对这段代码感到困惑。
当错误被引发时,它会检查driver_dummy_worker是否为None,但我们不是在上面将其设置为None吗,即self.driver_dummy_worker: RayWorkerVllm =None吗?

dauxcl2d

dauxcl2d2#

你解决了这个问题吗?多节点推理时出现了相同的错误。

nwo49xxi

nwo49xxi3#

@jens5588 did you solve it? here is what maybe helpful for you.
Check the default spec of placement group:
vllm/vllm/executor/ray_utils.py
Line 110 in eefeb16
| | placement_group_specs= ([{"GPU": 1}] *parallel_config.world_size) |
AFAIK, it'll work normally if the driver process and bundle are on the same node, so the serving may restart if it's not the case, and will retry if it's not the case.
You can also print out the driver and worker ip to check whether it's the case, you can also check the placement group where the bundle is.
And there is another issue recently if you're on recent build: https://github.com/vllm-project/vllm/pull/2727/files

4dc9hkyq

4dc9hkyq4#

当tp>1时,我看到了相同的问题。我们有解决这个问题的方法吗?

knpiaxh1

knpiaxh15#

看到了相同的问题,有什么解决方法?原因是什么?

3yhwsihp

3yhwsihp6#

如果您在一个vLLM示例中使用的GPU数量不超过一个节点,那么您可以使用mp后端:--distributed-executor-backend mp。有关更多详细信息,请参阅https://docs.vllm.ai/en/stable/serving/distributed_serving.html

cclgggtu

cclgggtu7#

@youkaichao my case is that I used two GPU nodes (each one has 8 GPUs), and my two containers (ray head and ray worker) both used all the 8 GPU. Is there a workaround for this as well?

prdp8dxp

prdp8dxp8#

那么为什么需要将它们组织成一个光线集群?它们可以单独工作。

41zrol4v

41zrol4v9#

在两个Ray集群中部署一个VLLLM示例是否有方法?

相关问题