DeepSpeed-MII 当添加新的特殊标记时,通道数必须能被8整除,

4nkexdtk  于 3个月前  发布在  其他
关注(0)|答案(3)|浏览(45)

我可以运行原始的LLaMA-2-7B模型本身,以及其微调版本,没有任何问题。然而,如果在微调过程中添加了特殊标记,就不能使用mii加载它。该模型在使用vLLM、HuggingFace的transformers和TGI时表现良好。

在使用Mistral-7B进行测试时也会出现同样的问题。

重现错误的最短代码是:

import mii
pipeline = mii.pipeline("stanford-oval/Llama-2-7b-WikiChat")
Traceback (most recent call last):
  File "/home/user1/llama/test.py", line 3, in <module>
    pipeline = mii.pipeline("./workdir/earlycombine_gpt4_fused_v3")
  File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/mii/api.py", line 156, in pipeline
    inference_engine = load_model(model_config)
  File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/mii/modeling/models.py", line 17, in load_model
    inference_engine = build_hf_engine(
  File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/engine_factory.py", line 126, in build_hf_engine
    return InferenceEngineV2(policy, engine_config)
  File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/engine_v2.py", line 83, in __init__
    self._model = self._policy.build_model(self._config, self._base_mp_group)
  File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 156, in build_model
    self.model = self.instantiate_model(engine_config, mp_group)
  File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/llama_v2/policy.py", line 17, in instantiate_model
    return Llama2InferenceModel(config=self._model_config, engine_config=engine_config, base_mp_group=mp_group)
  File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 222, in __init__
    self.make_unembedding_layer()
  File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 265, in make_unembedding_layer
    self.unembed = heuristics.instantiate_unembed(unembed_config, self._engine_config)
  File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/modules/heuristics.py", line 179, in instantiate_unembed
    return DSUnembedRegistry.instantiate_config(config)
  File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/modules/module_registry.py", line 39, in instantiate_config
    return cls.registry[config_bundle.name](config_bundle.config, config_bundle.implementation_config)
  File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/modules/implementations/unembed/ragged_unembed.py", line 69, in __init__
    self._act_fn = CUDABiasActivation(self._config.vocab_size, self._config.dtype, ActivationType.IDENTITY)
  File "/home/user1/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation.py", line 36, in __init__
    raise ValueError("channels must be divisible by 8")
ValueError: channels must be divisible by 8
GPU: NVIDIA A100 
Python: 3.10.13
deepspeed==0.13.0
deepspeed-kernels==0.0.1.dev1698255861
deepspeed-mii==0.2.0
torch==2.1.2+cu118
xytpbqjk

xytpbqjk1#

你好!你已经解决了吗?
不,我仍然收到相同的错误。

7fyelxc5

7fyelxc52#

感谢报告此问题!目前,DeepSpeed-FastGen fused bias and activation kernel要求通道数是8的倍数,因为它利用向量化指令来实现更好的性能!
目前支持的Llama模型具有词汇量为32000(任何可被8整除的词汇量都应该可以工作!)。"stanford-oval/Llama-2-7b-WikiChat"(以及添加了新的特殊标记的任何模型)具有32001或更多,这破坏了我们在解嵌入层中的融合偏置和激活核。
我们将很快将这个核泛化为处理任意通道大小,并通知您!谢谢!

cqoc49vn

cqoc49vn3#

有关此问题的任何更新吗?

相关问题