DeepSpeed-MII for循环调用非持久化管道将导致死锁,

o8x7eapl 于 3个月前发布在其他

关注(0)|答案(1)|浏览(34)

我目前正在使用2×H800 GPU进行llama2-13b推理，以下是我的代码：

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0, 1'
import torch
import mii
from transformers import LlamaTokenizer

local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))
print(f'world_size: {world_size}, local_rank: {local_rank}')

base_model = '/data/wlx/Llama-2-13b-chat-hf/'

input_text = [' DeepSpeed is a useful tools ' for _ in range(1000)]
input_text = ''.join(input_text) 
tokenizer = LlamaTokenizer.from_pretrained(base_model, use_fast=True)
input_ids, _ = tokenizer(input_text, return_tensors='pt').values()
input_ids = input_ids[0]

pipe = mii.pipeline(base_model)

for i in range(1, 4):
    cur_text = input_ids[:i * 512]
    if local_rank == 0:
        print(f'\nToken Shape: {cur_text.shape}\n')
    cur_text = tokenizer.decode(cur_text, skip_special_tokens=True,
                                clean_up_tokenization_spaces=False)
    with torch.no_grad():
        output = pipe(cur_text,
                      max_length=8172,
                      min_new_tokens=1,
                      max_new_tokens=256,
                      ignore_eos=False,
                      do_sample=False,
                      return_full_text=False)
    if local_rank == 0:
        output = output[0]
        print(f'prompt_length: {output.prompt_length}, \
output_length: {output.generated_length}, \
finished_reason: {output.finish_reason}')

为了准确控制推理长度，我将指定数量的令牌解码为文本作为输入。通常情况下，在for循环执行2-3次后会发生死锁，如下所示的日志：

[2023-12-29 21:09:48,975] [INFO] [engine_v2.py:84:__init__] Model built.
[2023-12-29 21:09:52,020] [INFO] [engine_v2.py:84:__init__] Model built.
[2023-12-29 21:09:55,399] [INFO] [kv_cache.py:135:__init__] Allocating KV-cache 0 with shape: (40, 617, 64, 2, 20, 128) consisting of 617 blocks.
[2023-12-29 21:09:55,399] [INFO] [kv_cache.py:135:__init__] Allocating KV-cache 0 with shape: (40, 617, 64, 2, 20, 128) consisting of 617 blocks.

Token Shape: torch.Size([512])

prompt_length: 512,                 output_length: 256,                 finished_reason: length

Token Shape: torch.Size([1024])

prompt_length: 1024,                 output_length: 256,                 finished_reason: length

Token Shape: torch.Size([1536])

Deadlock detected. Resetting KV cache and recomputing requests. Consider limiting number of concurrent requests or decreasing max lengths of prompts/generations.
[2023-12-29 21:10:42,060] [INFO] [launch.py:347:main] Process 1872027 exits successfully.
[2023-12-29 21:11:26,107] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1872026
[2023-12-29 21:11:26,107] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1872027
[2023-12-29 21:11:26,108] [ERROR] [launch.py:321:sigkill_handler] ['/home/lxwei/miniconda3/envs/longlora/bin/python', '-u', 'inference_test/mii_issue.py', '--local_rank=1'] exits with return code = -15

然而，我在只循环一次的for循环中输入更多的令牌，它可以正常运行，日志如下：

[2023-12-29 21:14:49,428] [INFO] [engine_v2.py:84:__init__] Model built.
[2023-12-29 21:14:50,419] [INFO] [engine_v2.py:84:__init__] Model built.
[2023-12-29 21:14:53,906] [INFO] [kv_cache.py:135:__init__] Allocating KV-cache 0 with shape: (40, 617, 64, 2, 20, 128) consisting of 617 blocks.
[2023-12-29 21:14:53,908] [INFO] [kv_cache.py:135:__init__] Allocating KV-cache 0 with shape: (40, 617, 64, 2, 20, 128) consisting of 617 blocks.

Token Shape: torch.Size([2048])

prompt_length: 2048,                 output_length: 256,                 finished_reason: length
[2023-12-29 21:15:20,969] [INFO] [launch.py:347:main] Process 1873408 exits successfully.
[2023-12-29 21:15:20,969] [INFO] [launch.py:347:main] Process 1873409 exits successfully.

DeepSpeed-MII

来源：https://github.com/microsoft/DeepSpeed-MII/issues/365