当我尝试使用 deepspeed --num_gpus 2 xxx.py
启动服务器时,会出现错误。但是如果我使用 python3 xxx.py
启动服务器,它运行得很好。我想在两个 A100(每个 A100 80G)上部署 llama-70b
(可能 140G),所以我必须使用 deepspeed
启动服务器。以下是信息:
[2024-01-20 10:15:26,416] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-20 10:15:26,676] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-20 10:15:26,846] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2024-01-20 10:15:26,846] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0
[2024-01-20 10:15:26,846] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2024-01-20 10:15:26,846] [INFO] [launch.py:163:main] dist_world_size=2
[2024-01-20 10:15:26,846] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2024-01-20 10:15:26,967] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:26,967] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:27,150] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:27,150] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:27,259] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2024-01-20 10:15:27,260] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0
[2024-01-20 10:15:27,260] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2024-01-20 10:15:27,260] [INFO] [launch.py:163:main] dist_world_size=2
[2024-01-20 10:15:27,260] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2024-01-20 10:15:28,970] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-20 10:15:29,041] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-20 10:15:29,083] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-20 10:15:29,117] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-20 10:15:29,509] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-20 10:15:29,576] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-20 10:15:29,576] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-01-20 10:15:29,804] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-20 10:15:29,805] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[W socket.cpp:436] [c10d] The server socket has failed to bind to [::]:29700 (errno: 98 - Address already in use).
[W socket.cpp:436] [c10d] The server socket has failed to bind to 0.0.0.0:29700 (errno: 98 - Address already in use).
[E socket.cpp:472] [c10d] The server socket has failed to listen on any local network address.
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/launch/multi_gpu_server.py", line 105, in <module>
main()
File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/launch/multi_gpu_server.py", line 98, in main
inference_pipeline = async_pipeline(args.model_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/api.py", line 167, in async_pipeline
inference_engine = load_model(model_config)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/modeling/models.py", line 14, in load_model
init_distributed(model_config)
File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/utils.py", line 187, in init_distributed
deepspeed.init_distributed(dist_backend="nccl", timeout=timedelta(seconds=1e9))
File "/home/infer/miniconda3/lib/python3.11/site-packages/deepspeed/comm/comm.py", line 670, in init_distributed
cdb = TorchBackend(dist_backend, timeout, init_method, rank, world_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/infer/miniconda3/lib/python3.11/site-packages/deepspeed/comm/torch.py", line 120, in __init__
self.init_process_group(backend, timeout, init_method, rank, world_size)
File "/home/infer/miniconda3/lib/python3.11/site-packages/deepspeed/comm/torch.py", line 146, in init_process_group
torch.distributed.init_process_group(backend,
File "/home/infer/miniconda3/lib/python3.11/site-packages/torch/distributed/c10d_logger.py", line 74, in wrapper
func_return = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/infer/miniconda3/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 1141, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/infer/miniconda3/lib/python3.11/site-packages/torch/distributed/rendezvous.py", line 241, in _env_rendezvous_handler
store = _create_c10d_store(master_addr, master_port, rank, world_size, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/infer/miniconda3/lib/python3.11/site-packages/torch/distributed/rendezvous.py", line 172, in _create_c10d_store
return TCPStore(
^^^^^^^^^
RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:29700 (errno: 98 - Address already in use). The server socket has failed to bind to 0.0.0.0:29700 (errno: 98 - Address already in use).
[2024-01-20 10:15:29,822] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-20 10:15:29,878] [INFO] [engine_v2.py:82:__init__] Building model...
[2024-01-20 10:15:29,944] [INFO] [engine_v2.py:82:__init__] Building model...
Using /home/infer/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Using /home/infer/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
[2024-01-20 10:15:30,593] [INFO] [engine_v2.py:82:__init__] Building model...
Using /home/infer/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
[2024-01-20 10:15:30,848] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1648350
[2024-01-20 10:15:30,848] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1648351
[2024-01-20 10:15:31,004] [ERROR] [launch.py:321:sigkill_handler] ['/home/infer/miniconda3/bin/python', '-m', 'mii.launch.multi_gpu_server', '--deployment-name', 'llama-deployment', '--load-balancer-port', '50050', '--restful-gateway-port', '28080', '--restful-gateway-host', 'localhost', '--restful-gateway-procs', '32', '--server-port', '50051', '--zmq-port', '25555', '--model-config', 'eyJtb2RlbF9uYW1lX29yX3BhdGgiOiAiL21udC9MbGFtYS0yLTdiLWNoYXQtaGYiLCAidG9rZW5pemVyIjogIi9tbnQvTGxhbWEtMi03Yi1jaGF0LWhmIiwgInRhc2siOiAidGV4dC1nZW5lcmF0aW9uIiwgInRlbnNvcl9wYXJhbGxlbCI6IDIsICJpbmZlcmVuY2VfZW5naW5lX2NvbmZpZyI6IHsidGVuc29yX3BhcmFsbGVsIjogeyJ0cF9zaXplIjogMn0sICJzdGF0ZV9tYW5hZ2VyIjogeyJtYXhfdHJhY2tlZF9zZXF1ZW5jZXMiOiAyMDQ4LCAibWF4X3JhZ2dlZF9iYXRjaF9zaXplIjogNzY4LCAibWF4X3JhZ2dlZF9zZXF1ZW5jZV9jb3VudCI6IDUxMiwgIm1heF9jb250ZXh0IjogODE5MiwgIm1lbW9yeV9jb25maWciOiB7Im1vZGUiOiAicmVzZXJ2ZSIsICJzaXplIjogMTAwMDAwMDAwMH0sICJvZmZsb2FkIjogZmFsc2V9fSwgInRvcmNoX2Rpc3RfcG9ydCI6IDI5NzAwLCAiem1xX3BvcnRfbnVtYmVyIjogMjU1NTUsICJyZXBsaWNhX251bSI6IDEsICJyZXBsaWNhX2NvbmZpZ3MiOiBbeyJob3N0bmFtZSI6ICJsb2NhbGhvc3QiLCAidGVuc29yX3BhcmFsbGVsX3BvcnRzIjogWzUwMDUxLCA1MDA1Ml0sICJ0b3JjaF9kaXN0X3BvcnQiOiAyOTcwMCwgImdwdV9pbmRpY2VzIjogWzAsIDFdLCAiem1xX3BvcnQiOiAyNTU1NX1dLCAiZGV2aWNlX21hcCI6ICJhdXRvIiwgIm1heF9sZW5ndGgiOiBudWxsLCAiYWxsX3Jhbmtfb3V0cHV0IjogZmFsc2UsICJzeW5jX2RlYnVnIjogZmFsc2UsICJwcm9maWxlX21vZGVsX3RpbWUiOiBmYWxzZX0='] exits with return code = 1
[2024-01-20 10:15:31,968] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:31,968] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:32,151] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:32,151] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
Traceback (most recent call last):
File "/home/infer/deepspeed-fastgen/quest.py", line 26, in <module>
client = mii.serve("/mnt/Llama-2-7b-chat-hf", deployment_name="llama-deployment", replica_num=1, #replica_num=2 tensor_parallel=2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/api.py", line 124, in serve
import_score_file(mii_config.deployment_name, DeploymentType.LOCAL).init()
File "/tmp/mii_cache/llama-deployment/score.py", line 33, in init
mii.backend.MIIServer(mii_config)
File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/backend/server.py", line 47, in __init__
self._wait_until_server_is_live(processes,
File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/backend/server.py", line 62, in _wait_until_server_is_live
raise RuntimeError(
RuntimeError: server crashed for some reason, unable to proceed
[2024-01-20 10:15:33,306] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1647573
[2024-01-20 10:15:33,306] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1647574
[2024-01-20 10:15:33,342] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1648352
[2024-01-20 10:15:33,404] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1648353
[2024-01-20 10:15:33,463] [INFO] [launch.py:324:sigkill_handler] Main process received SIGTERM, exiting
[2024-01-20 10:15:33,917] [ERROR] [launch.py:321:sigkill_handler] ['/home/infer/miniconda3/bin/python', '-u', 'quest.py', '--local_rank=1'] exits with return code = 1
起初,我认为这只是一个占用此端口的进程,所以我将其更改为 29700。但是如您所见,问题并未解决。我该怎么办?代码就像示例(但使用 llama-7b):
import mii
client = mii.serve("/mnt/Llama-2-7b-chat-hf", deployment_name="llama-deployment", tensor_parallel=2)
2条答案
按热度按时间bjg7j2ky1#
如果你使用
mii.serve
启动服务器,就不需要使用deepspeed
启动器来利用Tensor并行性。mii.serve
会调用DeepSpeed启动器,因此当你使用deepspeed --num_gpus 2
运行脚本时,你试图启动两个推理服务器(因此你看到了地址已经被使用的错误)。e1xvtsh32#
这段代码存在同样的问题:
from mii import pipeline pipe = pipeline("mistralai/Mistral-7B-Instruct-v0.1") output = pipe(["Hello, my name is", "DeepSpeed is"], max_new_tokens=128) print(output)
错误信息:
RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:29500 (errno: 98 - Address already in use). The server socket has failed to bind to 0.0.0.0:29500 (errno: 98 - Address already in use)
它仅使用管道,没有额外调用mii.serve。