DeepSpeed-MII可以在只有1个副本的情况下在多个GPU上进行推理吗?

dffbzjpn  于 1个月前  发布在  其他
关注(0)|答案(2)|浏览(87)

我有两个节点,每个节点都有一个16GB的GPU。我想在这两个节点上运行llama-2-13b-hf模型,每个节点有一个副本。
查看/job/hostfile:

deepspeed-mii-inference-worker-0 slots=1
deepspeed-mii-inference-worker-1 slots=1

服务器代码

import mii

client = mii.serve(
    "/data/Llama-2-13b-hf/",
    deployment_name="llama2-deployment",
    enable_restful_api=True,
    restful_api_port=28080,
    tensor_parallel=2,
    replica_num=1
)

错误:

[2024-03-08 09:16:45,938] [INFO] [multinode_runner.py:80:get_cmd] Running on the following workers: deepspeed-mii-inference-worker-0,deepspeed-mii-inference-worker-1
[2024-03-08 09:16:45,938] [INFO] [runner.py:568:main] cmd = pdsh -S -f 1024 -w deepspeed-mii-inference-worker-0,deepspeed-mii-inference-worker-1 export NCCL_VERSION=2.19.3-1; export PYTHONPATH=/data;  cd /data; /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJkZWVwc3BlZWQtbWlpLWluZmVyZW5jZS13b3JrZXItMCI6IFswXSwgImRlZXBzcGVlZC1taWktaW5mZXJlbmNlLXdvcmtlci0xIjogWzBdfQ== --node_rank=%n --master_addr=10.11.1.207 --master_port=29500 deepspeed-mii-server.py
deepspeed-mii-inference-worker-0: Warning: Permanently added 'deepspeed-mii-inference-worker-0' (ED25519) to the list of known hosts.
deepspeed-mii-inference-worker-1: Warning: Permanently added 'deepspeed-mii-inference-worker-1' (ED25519) to the list of known hosts.
deepspeed-mii-inference-worker-1: [2024-03-08 09:16:49,525] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
deepspeed-mii-inference-worker-0: [2024-03-08 09:16:49,682] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
deepspeed-mii-inference-worker-1: [2024-03-08 09:16:50,274] [INFO] [launch.py:138:main] 1 NCCL_VERSION=2.19.3-1
deepspeed-mii-inference-worker-1: [2024-03-08 09:16:50,274] [INFO] [launch.py:145:main] WORLD INFO DICT: {'deepspeed-mii-inference-worker-0': [0], 'deepspeed-mii-inference-worker-1': [0]}
deepspeed-mii-inference-worker-1: [2024-03-08 09:16:50,274] [INFO] [launch.py:151:main] nnodes=2, num_local_procs=1, node_rank=1
deepspeed-mii-inference-worker-1: [2024-03-08 09:16:50,274] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'deepspeed-mii-inference-worker-0': [0], 'deepspeed-mii-inference-worker-1': [1]})
deepspeed-mii-inference-worker-1: [2024-03-08 09:16:50,275] [INFO] [launch.py:163:main] dist_world_size=2
deepspeed-mii-inference-worker-1: [2024-03-08 09:16:50,275] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0
deepspeed-mii-inference-worker-1: [2024-03-08 09:16:50,275] [INFO] [launch.py:253:main] process 1208 spawned with command: ['/usr/bin/python3', '-u', 'deepspeed-mii-server.py', '--local_rank=0']
deepspeed-mii-inference-worker-0: [2024-03-08 09:16:50,547] [INFO] [launch.py:138:main] 0 NCCL_VERSION=2.19.3-1
deepspeed-mii-inference-worker-0: [2024-03-08 09:16:50,548] [INFO] [launch.py:145:main] WORLD INFO DICT: {'deepspeed-mii-inference-worker-0': [0], 'deepspeed-mii-inference-worker-1': [0]}
deepspeed-mii-inference-worker-0: [2024-03-08 09:16:50,548] [INFO] [launch.py:151:main] nnodes=2, num_local_procs=1, node_rank=0
deepspeed-mii-inference-worker-0: [2024-03-08 09:16:50,548] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'deepspeed-mii-inference-worker-0': [0], 'deepspeed-mii-inference-worker-1': [1]})
deepspeed-mii-inference-worker-0: [2024-03-08 09:16:50,548] [INFO] [launch.py:163:main] dist_world_size=2
deepspeed-mii-inference-worker-0: [2024-03-08 09:16:50,548] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0
deepspeed-mii-inference-worker-0: [2024-03-08 09:16:50,548] [INFO] [launch.py:253:main] process 1257 spawned with command: ['/usr/bin/python3', '-u', 'deepspeed-mii-server.py', '--local_rank=0']
deepspeed-mii-inference-worker-1: [2024-03-08 09:16:53,103] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
deepspeed-mii-inference-worker-0: [2024-03-08 09:16:53,717] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
deepspeed-mii-inference-worker-1: [2024-03-08 09:16:53,973] [INFO] [server.py:38:__init__] Hostfile /job/hostfile not found, creating hostfile.
deepspeed-mii-inference-worker-1: [2024-03-08 09:16:53,973] [INFO] [server.py:38:__init__] Hostfile /job/hostfile not found, creating hostfile.
deepspeed-mii-inference-worker-1: Traceback (most recent call last):
deepspeed-mii-inference-worker-1:   File "/data/deepspeed-mii-server.py", line 6, in <module>
deepspeed-mii-inference-worker-1:     client = mii.serve(
deepspeed-mii-inference-worker-1:   File "/usr/local/lib/python3.10/dist-packages/mii/api.py", line 124, in serve
deepspeed-mii-inference-worker-1:     import_score_file(mii_config.deployment_name, DeploymentType.LOCAL).init()
deepspeed-mii-inference-worker-1:   File "/tmp/mii_cache/llama2-deployment/score.py", line 33, in init
deepspeed-mii-inference-worker-1:     mii.backend.MIIServer(mii_config)
deepspeed-mii-inference-worker-1:   File "/usr/local/lib/python3.10/dist-packages/mii/backend/server.py", line 44, in __init__
deepspeed-mii-inference-worker-1:     mii_config.generate_replica_configs()
deepspeed-mii-inference-worker-1:   File "/usr/local/lib/python3.10/dist-packages/mii/config.py", line 302, in generate_replica_configs
deepspeed-mii-inference-worker-1:     replica_pool = _allocate_devices(self.hostfile,
deepspeed-mii-inference-worker-1:   File "/usr/local/lib/python3.10/dist-packages/mii/config.py", line 350, in _allocate_devices
deepspeed-mii-inference-worker-1:     raise ValueError(
deepspeed-mii-inference-worker-1: ValueError: Only able to place 0 replicas, but 1 replicas were requested.
deepspeed-mii-inference-worker-0: [2024-03-08 09:16:54,706] [INFO] [server.py:38:__init__] Hostfile /job/hostfile not found, creating hostfile.
deepspeed-mii-inference-worker-0: [2024-03-08 09:16:54,706] [INFO] [server.py:38:__init__] Hostfile /job/hostfile not found, creating hostfile.
deepspeed-mii-inference-worker-0: Traceback (most recent call last):
deepspeed-mii-inference-worker-0:   File "/data/deepspeed-mii-server.py", line 6, in <module>
deepspeed-mii-inference-worker-0:     client = mii.serve(
deepspeed-mii-inference-worker-0:   File "/usr/local/lib/python3.10/dist-packages/mii/api.py", line 124, in serve
deepspeed-mii-inference-worker-0:     import_score_file(mii_config.deployment_name, DeploymentType.LOCAL).init()
deepspeed-mii-inference-worker-0:   File "/tmp/mii_cache/llama2-deployment/score.py", line 33, in init
deepspeed-mii-inference-worker-0:     mii.backend.MIIServer(mii_config)
deepspeed-mii-inference-worker-0:   File "/usr/local/lib/python3.10/dist-packages/mii/backend/server.py", line 44, in __init__
deepspeed-mii-inference-worker-0:     mii_config.generate_replica_configs()
deepspeed-mii-inference-worker-0:   File "/usr/local/lib/python3.10/dist-packages/mii/config.py", line 302, in generate_replica_configs
deepspeed-mii-inference-worker-0:     replica_pool = _allocate_devices(self.hostfile,
deepspeed-mii-inference-worker-0:   File "/usr/local/lib/python3.10/dist-packages/mii/config.py", line 350, in _allocate_devices
deepspeed-mii-inference-worker-0:     raise ValueError(
deepspeed-mii-inference-worker-0: ValueError: Only able to place 0 replicas, but 1 replicas were requested.
deepspeed-mii-inference-worker-1: [2024-03-08 09:16:55,279] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 1208
deepspeed-mii-inference-worker-1: [2024-03-08 09:16:55,280] [ERROR] [launch.py:322:sigkill_handler] ['/usr/bin/python3', '-u', 'deepspeed-mii-server.py', '--local_rank=0'] exits with return code = 1
pdsh@deepspeed-mii-inference-launcher: deepspeed-mii-inference-worker-1: ssh exited with exit code 1
deepspeed-mii-inference-worker-0: [2024-03-08 09:16:56,554] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 1257
deepspeed-mii-inference-worker-0: [2024-03-08 09:16:56,554] [ERROR] [launch.py:322:sigkill_handler] ['/usr/bin/python3', '-u', 'deepspeed-mii-server.py', '--local_rank=0'] exits with return code = 1
r1wp621o

r1wp621o1#

你好@gujingit,我们目前不支持在节点之间拆分模型,只支持在单个节点上的GPU之间拆分,然后在不同的节点上拥有副本。

xj3cbfub

xj3cbfub2#

@mrwyattii 你好~ DeepSpeed-MII是否支持单个节点上的多个副本?例如,我有一个带有8个A100 GPU的节点,我将tensor_parallel设置为4,replica_num设置为2。我发现每次只有4个GPU在工作,剩下的4个GPU只是等待。有点奇怪!

相关问题