[用法]:Can vllm 使用类似于 transformers 中的 device_map 方法?

jtoj6r0c  于 3个月前  发布在  其他
关注(0)|答案(9)|浏览(40)

当前环境

您希望如何使用vllm

我拥有三个4090 GPU,总共有24*3 GB的内存,而我需要部署的模型至少需要52 GB。问题在于,并行部署要求GPU的数量可以被32整除,这显然是不可行的。vllm能否使用类似于transformers中的device_map的方法来指定每个层的部署方式,以解决这个问题?

ecbunoof

ecbunoof2#

你可以使用 --pipeline-parallel-size 3,参见 https://docs.vllm.ai/en/latest/serving/distributed_serving.html
感谢您的指导。我应该将参数 pipeline-parallel-size 设置为3吗?tensor_parallel_size 也应该设置为3吗?

pzfprimi

pzfprimi3#

具体来说,它是--pipeline-parallel-size 3 --tensor_parallel_size 1,后者可以省略,因为它是默认值。

bzzcjhmw

bzzcjhmw4#

在设置参数后,报告了一个错误:
Traceback (most recent call last):
File "", line 1, in
File "/data/logs/drone-exec/envir6767/model_lib.py", line 263, in load_with_engine
self.engine = LLMEngine.from_engine_args(EngineArgs.from_cli_args(self.args))
File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 385, in from_engine_args
engine_config = engine_args.create_engine_config()
File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 670, in create_engine_config
parallel_config = ParallelConfig(
File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/config.py", line 698, in init
self._verify_args()
File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/config.py", line 704, in _verify_args
raise NotImplementedError("Pipeline parallelism is not supported "
NotImplementedError: 使用多进程时,尚不支持管道并行。
相关代码:
if 'gemma' in self.name.lower():
print("模型为gemma")
self.args.pipeline_parallel_size =3
self.args.tensor_parallel_size =1
self.engine = LLMEngine.from_engine_args(EngineArgs.from_cli_args(self.args))
上述代码已写入其自己的加载模块中。

eni9jsuy

eni9jsuy5#

即使我严格按照文档中的说明操作,它仍然无法正常工作。

从vllm导入LLM
llm = LLM('/data/big_model/gemma-2-27b-it', pipeline_parallel_size=3)
INFO 07-20 13:05:45 config.py:695] 默认使用mp进行分布式推理
Traceback (most recent call last):
File "", line 1, in
File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 150, in init
self.llm_engine = LLMEngine.from_engine_args(
File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 385, in from_engine_args
engine_config = engine_args.create_engine_config()
File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 670, in create_engine_config
parallel_config = ParallelConfig(
File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/config.py", line 698, in init
self._verify_args()
File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/config.py", line 704, in _verify_args
raise NotImplementedError("Pipeline parallelism is not supported yet with multiprocessing.")
NotImplementedError: 目前尚不支持与多进程一起使用管道并行。

xesrikrc

xesrikrc6#

这是一个新功能,尝试按照 https://docs.vllm.ai/en/latest/getting_started/installation.html 安装最新版本的主程序,或者等待下一次发布。

kwvwclae

kwvwclae7#

这是一个新功能,尝试按照以下步骤安装最新版本的主程序,或者等待下一次发布。

是否可以设置以下参数:distributed_executor_backend。在我设置好之后,

from vllm import LLM
llm = LLM('/data/big_model/gemma-2-27b-it', distributed_executor_backend="ray", pipeline_parallel_size=3)

我得到了另一个错误:

Traceback (most recent call last):
  File "", line 1, in
  File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 150, in init
  self.llm_engine = LLMEngine.from_engine_args(
  File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 385, in from_engine_args
  engine_config = engine_args.create_engine_config()
  File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 670, in create_engine_config
  parallel_config = ParallelConfig(
  File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/config.py", line 698, in init
  self._verify_args()
  File "/root/anaconda3/envs/guihun_doc_aigc/lib/python3.10/site-packages/vllm/config.py", line 704, in _verify_args
  raise NotImplementedError("Pipeline parallelism is not supported yet with multiprocessing.")
NotImplementedError: Pipeline parallelism is not supported yet with multiprocessing.
vybvopom

vybvopom8#

请提供一个最小可复现示例,并附带完整日志。

9rbhqvlz

9rbhqvlz9#

请注意,pipeline_parallel_sizeLLM中不受支持。您需要通过openai api服务器使用它。请仔细阅读文档https://docs.vllm.ai/en/latest/serving/distributed_serving.html

相关问题