DeepSpeed-MII tp > 1时，推理速度非常慢,

ivqmmu1c 于 3个月前发布在其他

关注(0)|答案(2)|浏览(50)

请使用最新的主要分支和测试模型llamav2-7b。当我使用tp=4进行单个句子推理测试时，耗时267.98秒，但当tp=1时，测试单个句子推理仅需7秒。这个结果非常奇怪。请您帮忙查看一下？

此外，对于并发测试，我修改了DeepSpeed-MII/mii/backend/client.py文件的第73行。如果我的修改有问题，能否提供一个支持并发客户端测试的示例？非常感谢您！

DeepSpeed-MII

来源：https://github.com/microsoft/DeepSpeed-MII/issues/297

2条答案

按热度按时间

2wnc66cl1#

你好，easonfzw,TP=4的时间确实看起来很糟糕！我刚刚在最新的main分支上测试了一下，以下是我在2xA6000设置上看到的结果：

import mii
import time

client = mii.serve(
    "meta-llama/Llama-2-7b-hf",
    tensor_parallel=1,
)
start = time.time()
output = client.generate("DeepSpeed is", max_length=1024, ignore_eos=True)
end = time.time()
client.terminate_server()
tp1_time = end - start

client = mii.serve(
    "meta-llama/Llama-2-7b-hf",
    tensor_parallel=2,
)
start = time.time()
output = client.generate("DeepSpeed is", max_length=1024, ignore_eos=True)
end = time.time()
client.terminate_server()
tp2_time = end - start

print("TP1 time:", tp1_time)
print("TP2 time:", tp2_time)

输出：

TP1 time: 22.425052165985107
TP2 time: 13.85570764541626

你能分享一下你的设置吗？你使用的是哪些GPU,你安装了哪个版本的CUDA,你安装了哪个版本的pytorch?
我不确定你所做的修改。我需要深入了解一下代码，以了解这是否会对性能产生任何负面影响。对于多客户端测试，我们会生成多个进程。例如，你可以这样做：

import subprocess
processes = []
for i in range(32):
    processes.append(
        subprocess.Popen(
            [
                "python",
                "-c",
                f"import mii; mii.client('meta-llama/Llama-2-7b-hf')('DeepSpeed is', ignore_eos=True, max_length=256)",
            ],
            stdout=subprocess.PIPE,
        )
    )

你是想在单个进程中进行多个客户端的基准测试吗？

赞(0）回复(0）举报 3个月前

gudnpqoy2#

首先，感谢您的回复。

非常奇怪的是，我使用了您上面的例子(tp=1和tp=2)进行测试。tp=2耗费了很多时间。期待您的回复 :)

**TP1时间：8.999823808670044秒

TP2时间：337.3766210079193秒**
环境信息：
H100(80GB) 1*gpu
NVIDIA-SMI 525.147.05驱动版本：525.147.05 CUDA版本：12.2
Python 3.10.12(主线程，6月11日2023年，05:26:28) [GCC 11.4.0] 在linux上运行
torch 2.1.0a0+32f93b1
transformers 4.34.0
flash-attn 2.3.2

**TP1时间：15.307891845703125秒

TP2时间：55.08188509941101秒**
环境信息：
A100(40GB) 1*gpu
NVIDIA-SMI 515.105.01驱动版本：515.105.01 CUDA版本：12.1
Python 3.10.6(主线程，3月10日2023年，10:55:28) [GCC 11.3.0] 在linux上运行
torch 2.0.0
transformers 4.34.0
flash-attn 2.3.2
DeepSpeed:提交ID为4388a605f854db91302c4f89053ee861eb31bacd
DeepSpeed-Kernels:提交ID为b62777e8ba87d82689b40625067f58a683bf7788
DeepSpeed-MII:提交ID为ddbc6fc
此外，当我使用您提供的子进程示例代码时，会报告一个错误：

Traceback (most recent call last):
  File "/usr/lib/python3.10/logging/__init__.py", line 1104, in emit
    self.flush()
  File "/usr/lib/python3.10/logging/__init__.py", line 1084, in flush
    self.stream.flush()
BrokenPipeError: [Errno 32] Broken pipe

赞(0）回复(0）举报 3个月前

我来回答

DeepSpeed-MII tp > 1时，推理速度非常慢,

2条答案

相关问题

热门标签

最新问答