text-generation-inference 运行时错误："weight lm_head.weight不存在",当加载qwen2-0.5B-Instruct模型时,

oyxsuwqo 于 9个月前发布在其他

关注(0)|答案(3)|浏览(302)

在使用TGI库加载qwen2-0.5B-Instruct模型时遇到了一个问题。抛出的错误信息是“RuntimeError: weight lm_head.weight does not exist”。

我怀疑这可能是由于'safetensors'文件没有保留'tied'参数导致的。似乎在调用'model.save_pretrained()'之前，可以通过解开'lm_head'和'embed_tokens'来避免这个问题。
有趣的是，这个问题并没有出现在较大的qwen模型中，例如7B或72B版本。我在想这是否是一个预期的行为还是一个意外的bug。
我正在使用最新的官方镜像：ghcr.io/huggingface/text-generation-inference:2.2.0。

回溯信息：

2024-08-07T16:32:00.053201Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 231, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 953, in get_model
    return FlashCausalLM(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 898, in __init__
    model = model_class(prefix, config, weights)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_qwen2_modeling.py", line 344, in __init__
    self.lm_head = SpeculativeHead.load(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/layers/speculative.py", line 40, in load
    lm_head = TensorParallelHead.load(config, prefix, weights)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/layers/tensor_parallel.py", line 67, in load
    weight = weights.get_tensor(f"{prefix}.weight")
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/weights.py", line 212, in get_tensor
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/weights.py", line 193, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 231, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 231, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 953, in get_model
    return FlashCausalLM(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 898, in __init__
    model = model_class(prefix, config, weights)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_qwen2_modeling.py", line 344, in __init__
    self.lm_head = SpeculativeHead.load(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/layers/speculative.py", line 40, in load
    lm_head = TensorParallelHead.load(config, prefix, weights)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/layers/tensor_parallel.py", line 67, in load
    weight = weights.get_tensor(f"{prefix}.weight")
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/weights.py", line 212, in get_tensor
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/weights.py", line 193, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist
 rank=0
2024-08-07T16:32:05.257583Z ERROR text_generation_launcher: Shard 0 failed to start
2024-08-07T16:32:05.257616Z  INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart```

text-generation-inference

来源：https://github.com/huggingface/text-generation-inference/issues/2373

3条答案

按热度按时间

yi0zb3m41#

你好，@boyang-nlp 👋
感谢你报告这个问题！
我认为我们目前在带宽方面有些受限，但希望我能在下周看一下。如果在此期间你有时间深入研究可能的解决方案，请随时在这里发布 👍

赞(0）回复(0）举报 9个月前

jgzswidk2#

当然！不用担心，您可以按照自己的需要来。我会继续寻找解决方案并更新您。谢谢！👍

赞(0）回复(0）举报 9个月前

dffbzjpn3#

你好，@boyang-nlp 和 @ErikKaum,
我们也在 Qwen2-1.5B 上遇到了这个问题，这里是临时解决方案(应该也适用于 Qwen2-0.5B):
打开 huggingface docker (ghcr.io/huggingface/text-generation-inference:2.2.0),然后在其中打开 speculative.py 文件。

vi /opt/conda/lib/python3.10/site-packages/text_generation_server/layers/speculative.py

在那个文件中，在 line 40 处添加以下行(位于 "FIX START" 和 "FIX END" 注解之间):

import torch
...
class SpeculativeHead(torch.nn.Module):
    ...
    @staticmethod
    def load(config, prefix: str, weights):
        speculator = config.speculator
        if speculator:
            ...
        else:
            # FIX START
            if config._name_or_path == "Qwen/Qwen2-1.5B":
                if prefix == "lm_head":
                    prefix = "model.embed_tokens"
            # FIX END
            lm_head = TensorParallelHead.load(config, prefix, weights)
            speculator = None
        return SpeculativeHead(lm_head, speculator)
...

上述修复之所以有效，是因为对于 Qwen2-1.5B,嵌入是绑定在一起的。
为了验证这一点，我们运行了以下脚本：

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2-1.5B",
    torch_dtype="auto",
    device_map="auto"
)
torch.all(model.model.embed_tokens.weight == model.lm_head.weight)

输出结果如下：

tensor(True, device='cuda:0')

这意味着 lm_head 的权重与 model.embed_tokens 的权重相等。

展开查看全部

赞(0）回复(0）举报 9个月前

我来回答

text-generation-inference 运行时错误："weight lm_head.weight不存在",当加载qwen2-0.5B-Instruct模型时,

3条答案

相关问题

热门标签

最新问答