DeepSpeed-MII 加载较大的模型,如Llama-2 70B进行服务时出现问题,

368yc8dk  于 3个月前  发布在  其他
关注(0)|答案(3)|浏览(51)

你好,
我尝试使用以下配置加载Llama-2 70B进行推理,但无法加载模型。然而,当我尝试使用相同的配置为Llama-2 7B和13B提供服务时,我可以毫无问题地使用它。

设备设置

使用的GPU - Nvidia A100 x 8
CPU - 4
内存 - 1024GB

代码

import mii

mii_configs = {"tensor_parallel": 8, "dtype": "fp16", "skip_model_check": True }

mii.deploy(
           task="question-answering",
           model="/local/2_llama/Llama-2-70b-chat-hf",  # using the downloaded version of HuggingFace Llama-2
           deployment_name="llama_2_deployment",
           mii_config=mii_configs
)

日志

[2023-08-10 22:20:37,383] [INFO] [deployment.py:87:deploy] ************* MII is using DeepSpeed Optimizations to accelerate your model *************
[2023-08-10 22:20:37,629] [INFO] [server_client.py:219:_initialize_service] MII using multi-gpu deepspeed launcher:
 ------------------------------------------------------------
 task-name .................... text-generation
 model ........................ /home/ray/dl/llm/2_llama/Llama-2-70b-hf
 model-path ................... /tmp/mii_models
 port ......................... 50050
 provider ..................... hugging-face
 ------------------------------------------------------------
[2023-08-10 22:20:39,765] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-08-10 22:20:41,285] [INFO] [runner.py:550:main] cmd = /home/ray/anaconda3/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=29500 --no_python --no_local_rank --enable_each_rank_log=None /home/ray/anaconda3/bin/python -m mii.launch.multi_gpu_server --task-name text-generation --model /home/ray/dl/llm/2_llama/Llama-2-70b-hf --model-path /tmp/mii_models --port 50050 --ds-optimize --provider hugging-face --config ey....
[2023-08-10 22:20:42,660] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2023-08-10 22:20:43,440] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.16.2-1+cuda11.8
[2023-08-10 22:20:43,440] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.16.2-1
[2023-08-10 22:20:43,440] [INFO] [launch.py:135:main] 0 NCCL_VERSION=2.16.2-1
[2023-08-10 22:20:43,440] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
[2023-08-10 22:20:43,440] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.16.2-1+cuda11.8
[2023-08-10 22:20:43,440] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2
[2023-08-10 22:20:43,440] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.16.2-1
[2023-08-10 22:20:43,440] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2023-08-10 22:20:43,441] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=8, node_rank=0
[2023-08-10 22:20:43,441] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2023-08-10 22:20:43,441] [INFO] [launch.py:162:main] dist_world_size=8
[2023-08-10 22:20:43,441] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
[2023-08-10 22:20:47,683] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...

...

[2023-08-10 22:57:44,970] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2023-08-10 22:57:49,974] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2023-08-10 22:57:54,979] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2023-08-10 22:57:59,983] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2023-08-10 22:58:04,988] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2023-08-10 22:58:09,992] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2023-08-10 22:58:14,994] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2023-08-10 22:58:19,999] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2023-08-10 22:58:25,003] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2023-08-10 22:58:30,008] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2023-08-10 22:58:35,013] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2023-08-10 22:58:36,906] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1294167
[2023-08-10 22:58:40,017] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2023-08-10 22:58:42,433] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1294168
[2023-08-10 22:58:45,022] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2023-08-10 22:58:47,153] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1294169
[2023-08-10 22:58:50,026] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2023-08-10 22:58:53,134] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1294170
[2023-08-10 22:58:55,031] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2023-08-10 22:58:56,716] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1294171
[2023-08-10 22:58:59,443] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1294173
[2023-08-10 22:58:59,443] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1294175
[2023-08-10 22:59:00,035] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2023-08-10 22:59:03,144] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1294177
[2023-08-10 22:59:05,040] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
[2023-08-10 22:59:07,126] [ERROR] [launch.py:324:sigkill_handler] ['/home/ray/anaconda3/bin/python', '-m', 'mii.launch.multi_gpu_server', '--task-name', 'text-generation', '--model', '/home/ray/dl/llm/2_llama/Llama-2-70b-hf', '--model-path', '/tmp/mii_models', '--port', '50050', '--ds-optimize', '--provider', 'hugging-face', '--config', 'ey...'] exits with return code = -9
[2023-08-10 22:59:10,044] [INFO] [server_client.py:117:_wait_until_server_is_live] waiting for server to start...
Traceback (most recent call last):
  File "deploy.py", line 5, in <module>
    mii.deploy(task="text-generation",
  File "/home/ray/anaconda3/lib/python3.8/site-packages/mii/deployment.py", line 114, in deploy
    return _deploy_local(deployment_name, model_path=model_path)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/mii/deployment.py", line 120, in _deploy_local
    mii.utils.import_score_file(deployment_name).init()
  File "/tmp/mii_cache/llama_2_deployment/score.py", line 30, in init
    model = mii.MIIServerClient(task,
  File "/home/ray/anaconda3/lib/python3.8/site-packages/mii/server_client.py", line 92, in __init__
    self._wait_until_server_is_live()
  File "/home/ray/anaconda3/lib/python3.8/site-packages/mii/server_client.py", line 115, in _wait_until_server_is_live
    raise RuntimeError("server crashed for some reason, unable to proceed")
RuntimeError: server crashed for some reason, unable to proceed

如果有人能帮助我解决这个问题,那就太好了。

oyt4ldly

oyt4ldly1#

我们目前不支持70B llama-2模型(这里的模型架构与较小的llama-2变体不同)。我们正在努力尽快添加支持!

cxfofazt

cxfofazt2#

是否支持更大的llama2(例如70B)型号?

vfhzx4xs

vfhzx4xs3#

我们将很快将llama2-70b支持与内核注入合并。这里是PR:microsoft/DeepSpeed#4313
您可以安装DeepSpeed的该分支并尝试使用MII。我们目前也支持在DeepSpeed的最新版本(设置enable_deepspeed=False)上进行Tensor并行,而无需内核注入。

相关问题