DeepSpeed-MII 关于llama2-7b-hf模型使用MII-Public进行错误推理

xam8gpfp 于 6个月前发布在其他

关注(0)|答案(4)|浏览(132)

根据您提供的错误信息，问题出在layer_norm.cu文件中。这些错误是由于在CUDA代码中使用了半精度浮点数(const __half)进行运算，但没有正确处理这些特殊情况。为了解决这个问题，您可以尝试以下方法：

在layer_norm.cu文件的开头添加以下代码，以禁用半精度浮点数的支持：


# define __HALF_ENABLED__

修改reduction_utils.h文件中的运算符重载，以处理半精度浮点数的特殊情况。例如，将以下代码：

return lhs + rhs;

替换为：

return __half2::__add(__half2::from_bits(lhs.x), __half2::from_bits(rhs.x));

同样，对于其他涉及半精度浮点数运算的操作符，也需要进行相应的修改。

重新编译和运行您的代码。如果一切顺利，这些问题应该已经解决。
这个错误是由于在构建transformer_inference扩展时出现了问题。你可以尝试以下方法解决这个问题：
确保你的PyTorch和DeepSpeed版本是兼容的。你可以查看官方文档以获取更多信息。
清除之前的构建缓存，然后重新构建。你可以使用以下命令来清除缓存：

rm -rf build/
rm -rf dist/
rm -rf torch_model_parallel.egg-info/

如果问题仍然存在，尝试升级或降级PyTorch和DeepSpeed的版本，以找到一个兼容的组合。
如果以上方法都无法解决问题，你可以尝试在GitHub上提交一个issue,详细描述你遇到的问题，并附上相关的错误信息和代码片段。这样，DeepSpeed的开发者和其他用户可能会帮助你找到解决方案。
文件 "/home/ai_group/anaconda3/envs/liuy/lib/python3.10/site-packages/mii/deployment.py",第120行，在_deploy_local函数中

mii.utils.import_score_file(deployment_name).init()

文件 "/tmp/mii_cache/llama2_deployment/score.py",第30行，在init函数中

model = mii.MIIServerClient(task,

文件 "/home/ai_group/anaconda3/envs/liuy/lib/python3.10/site-packages/mii/server_client.py",第92行，在init函数中

self._wait_until_server_is_live()

文件 "/home/ai_group/anaconda3/envs/liuy/lib/python3.10/site-packages/mii/server_client.py",第115行，在_wait_until_server_is_live函数中

raise RuntimeError("server crashed for some reason, unable to proceed")

运行时错误：服务器因某种原因崩溃，无法继续进行。

DeepSpeed-MII

来源：https://github.com/microsoft/DeepSpeed-MII/issues/226

4条答案

按热度按时间

deyfvvtc1#

你好，@ly19970621,这看起来像是与DeepSpeed-Inference的内核编译相关的问题。你能分享一下ds_report的输出吗？同时，请尝试使用DS_BUILD_TRANSFORMER_INFERENCE=1 pip install deepspeed预先编译内核并分享结果。谢谢！

赞(0）回复(0）举报 6个月前

wlzqhblo2#

你好，@ly19970621,这个问题看起来像是DeepSpeed-Inference内核编译过程中出现的问题。你能分享一下ds_report的输出结果吗？另外，请尝试使用DS_BUILD_TRANSFORMER_INFERENCE=1 pip install deepspeed预先编译内核并分享结果。谢谢！

感谢回复。我运行了python -m deepspeed.env_report,以下是输出结果：

[2023-08-18 01:26:20,378] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0
 [WARNING]  using untested triton version (2.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/ai_group/anaconda3/envs/liuy026/lib/python3.10/site-packages/torch']
torch version .................... 2.0.1+cu118
deepspeed install path ........... ['/home/ai_group/anaconda3/envs/liuy026/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.10.0, unknown, unknown
torch cuda version ............... 11.8
torch hip version ................ None
nvcc version ..................... 12.2
deepspeed wheel compiled w. ...... torch 2.0, cuda 11.8

我运行了DS_BUILD_TRANSFORMER_INFERENCE=1 pip install deepspeed,环境没有改变，重新执行仍然出现上述错误。

赞(0）回复(0）举报 6个月前

nhn9ugyo3#

对不起，我没有及时回复。如果你仍然遇到这个错误，请尝试仅使用DeepSpeed运行以下脚本并分享结果？

import torch
import deepspeed
import os
from transformers import pipeline

local_rank = int(os.getenv("LOCAL_RANK", "0"))
world_size = int(os.getenv("WORLD_SIZE", "1"))

task_name = "text-generation"
model_name_or_path = "./llama2_7b_hf"
input_strs = ["DeepSpeed is", "Microsoft is"]

def run():
    pipe = pipeline(task_name, model_name_or_path, torch_dtype=torch.float16, device=local_rank)

    pipe.model = deepspeed.init_inference(
        pipe.model,
        replace_with_kernel_inject=True,
        mp_size=world_size,
        dtype=torch.float16,
    )

    output = pipe(input_strs)
    print(output)

if __name__ == "__main__":
    run()

用 deepspeed llama-example.py 运行这个

赞(0）回复(0）举报 6个月前

2w2cym1i4#

关于这个的更新，先生？我尝试用T4(4 x 16 GB) GPU运行llama-2-13b-chat-hf,但仍然遇到这个问题。
Cc: @mrwyattii

赞(0）回复(0）举报 6个月前