DeepSpeed-MII [BUG] Issue serving Mixtral 8x7B on H100

3npbholx  于 6个月前  发布在  其他
关注(0)|答案(9)|浏览(136)

在4 x H100(TP=4)上使用deepspeed-mii v0.2.3为Mixtral 8x7B提供服务时遇到问题,其他参数默认来自nvidia nvidia/cuda:12.3.1-devel-ubuntu22.04基础镜像。跟踪显示:

undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii

还有一个警告:FP6 quantization kernel is only supported on Ampere architectures,但我在启动服务器时没有指定量化。似乎有一个未使用的内核被导入,但它没有在Grace Hopper设备上注册。
当我降级到v0.2.2时,遇到了以下错误:

Arch unsupported for MoE GEMM
rggaifut

rggaifut1#

我遇到了同样的问题,V100s的错误输出完全相同。当我切换到A100s时,问题得到了解决。

qzwqbdag

qzwqbdag2#

我遇到了同样的问题,V100的错误输出完全相同。当我切换到A100时,问题得到了解决
是的 - 可以确认在A100上可以正常工作,但在H100上不行

vulvrdjw

vulvrdjw3#

感谢您的反馈。看起来在最近的发布中,当我们添加了FP6量化支持时引入了一个错误。我会进行调查并修复这个错误。谢谢!

b4wnujal

b4wnujal4#

@JamesTheZ 可能知道这个。

toiithl6

toiithl65#

JamesTheZ可能知道这个。
这似乎是因为当前的实现只在Ampere上编译cuda_linear_kernels.cpp:https://github.com/microsoft/DeepSpeed/blob/330d36bb39b8dd33b5603ee0024705db38aab534/op_builder/inference_core_ops.py#L75-L81

zengzsys

zengzsys6#

我遇到了一个问题,meta-llama/Llama-2-7b-chat-hf 在 H100 上因为未定义的符号:_Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii 而无法工作。我也在 mistralai/Mistral-7B-v0.1 上遇到了同样的问题。这两个模型在我设置中都无法正常工作。
我尝试使用 deepspeed-mii(0.2.1、0.2.2 和 0.2.3)等多个版本,以及 PyTorch(2.2.1、2.1.2 和 2.1.0)的不同版本组合,但这些组合似乎都没有奏效。此外,甚至从源代码编译,但不幸的是,我没有成功。
有人遇到同样的问题吗?或者有什么建议如何解决这个问题?

import mii
pipe = mii.pipeline("meta-llama/Llama-2-7b-chat-hf")
NVIDIA H100 80GB
Driver Version: 535.104.12
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1
 [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/taishi/workplace/mii/venv/lib/python3.10/site-packages/torch']
torch version .................... 2.1.0+cu121
deepspeed install path ........... ['/home/taishi/workplace/mii/venv/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.14.0, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.1
deepspeed wheel compiled w. ...... torch 2.1, cuda 12.1
shared memory (/dev/shm) size .... 999.98 GB
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.6 LTS
Release:	20.04
Codename:	focal
to94eoyn

to94eoyn7#

Downgrading to this will work:
deepspeed 0.13.5
deepspeed-mii 0.2.2

o2gm4chl

o2gm4chl8#

这个问题有任何更新吗?

5uzkadbs

5uzkadbs9#

我发现这是一个上游FasterTransformer的问题,请检查这些行。但是faster transformer已经迁移到TensorRT-LLM,它确实有an implementation under sm_90。您有什么解决计划吗?或者欢迎PR?

相关问题