我遇到了一个问题，meta-llama/Llama-2-7b-chat-hf 在 H100 上因为未定义的符号：_Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii 而无法工作。我也在 mistralai/Mistral-7B-v0.1 上遇到了同样的问题。这两个模型在我设置中都无法正常工作。
我尝试使用 deepspeed-mii(0.2.1、0.2.2 和 0.2.3)等多个版本，以及 PyTorch(2.2.1、2.1.2 和 2.1.0)的不同版本组合，但这些组合似乎都没有奏效。此外，甚至从源代码编译，但不幸的是，我没有成功。
有人遇到同样的问题吗？或者有什么建议如何解决这个问题？

import mii
pipe = mii.pipeline("meta-llama/Llama-2-7b-chat-hf")

NVIDIA H100 80GB
Driver Version: 535.104.12

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1
 [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/taishi/workplace/mii/venv/lib/python3.10/site-packages/torch']
torch version .................... 2.1.0+cu121
deepspeed install path ........... ['/home/taishi/workplace/mii/venv/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.14.0, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.1
deepspeed wheel compiled w. ...... torch 2.1, cuda 12.1
shared memory (/dev/shm) size .... 999.98 GB

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0

Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.6 LTS
Release:	20.04
Codename:	focal

9条答案

按热度按时间

rggaifut1#

我遇到了同样的问题，V100s的错误输出完全相同。当我切换到A100s时，问题得到了解决。

赞(0）回复(0）举报 6个月前

qzwqbdag2#

我遇到了同样的问题，V100的错误输出完全相同。当我切换到A100时，问题得到了解决
是的 - 可以确认在A100上可以正常工作，但在H100上不行

vulvrdjw3#

感谢您的反馈。看起来在最近的发布中，当我们添加了FP6量化支持时引入了一个错误。我会进行调查并修复这个错误。谢谢！

b4wnujal4#

@JamesTheZ 可能知道这个。

toiithl65#

JamesTheZ可能知道这个。
这似乎是因为当前的实现只在Ampere上编译cuda_linear_kernels.cpp:https://github.com/microsoft/DeepSpeed/blob/330d36bb39b8dd33b5603ee0024705db38aab534/op_builder/inference_core_ops.py#L75-L81

zengzsys6#