在使用TGI库加载qwen2-0.5B-Instruct模型时遇到了一个问题。抛出的错误信息是“RuntimeError: weight lm_head.weight does not exist”。
我怀疑这可能是由于'safetensors'文件没有保留'tied'参数导致的。似乎在调用'model.save_pretrained()'之前,可以通过解开'lm_head'和'embed_tokens'来避免这个问题。
有趣的是,这个问题并没有出现在较大的qwen模型中,例如7B或72B版本。我在想这是否是一个预期的行为还是一个意外的bug。
我正在使用最新的官方镜像:ghcr.io/huggingface/text-generation-inference:2.2.0。
回溯信息:
2024-08-07T16:32:00.053201Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 231, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 953, in get_model
return FlashCausalLM(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 898, in __init__
model = model_class(prefix, config, weights)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_qwen2_modeling.py", line 344, in __init__
self.lm_head = SpeculativeHead.load(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/layers/speculative.py", line 40, in load
lm_head = TensorParallelHead.load(config, prefix, weights)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/layers/tensor_parallel.py", line 67, in load
weight = weights.get_tensor(f"{prefix}.weight")
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/weights.py", line 212, in get_tensor
filename, tensor_name = self.get_filename(tensor_name)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/weights.py", line 193, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 231, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 231, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 953, in get_model
return FlashCausalLM(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 898, in __init__
model = model_class(prefix, config, weights)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_qwen2_modeling.py", line 344, in __init__
self.lm_head = SpeculativeHead.load(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/layers/speculative.py", line 40, in load
lm_head = TensorParallelHead.load(config, prefix, weights)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/layers/tensor_parallel.py", line 67, in load
weight = weights.get_tensor(f"{prefix}.weight")
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/weights.py", line 212, in get_tensor
filename, tensor_name = self.get_filename(tensor_name)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/weights.py", line 193, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist
rank=0
2024-08-07T16:32:05.257583Z ERROR text_generation_launcher: Shard 0 failed to start
2024-08-07T16:32:05.257616Z INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart```
3条答案
按热度按时间yi0zb3m41#
你好,@boyang-nlp 👋
感谢你报告这个问题!
我认为我们目前在带宽方面有些受限,但希望我能在下周看一下。如果在此期间你有时间深入研究可能的解决方案,请随时在这里发布 👍
jgzswidk2#
当然!不用担心,您可以按照自己的需要来。我会继续寻找解决方案并更新您。谢谢!👍
dffbzjpn3#
你好,@boyang-nlp 和 @ErikKaum,
我们也在 Qwen2-1.5B 上遇到了这个问题,这里是临时解决方案(应该也适用于 Qwen2-0.5B):
打开 huggingface docker (ghcr.io/huggingface/text-generation-inference:2.2.0),然后在其中打开
speculative.py
文件。在那个文件中,在
line 40
处添加以下行(位于 "FIX START" 和 "FIX END" 注解之间):上述修复之所以有效,是因为对于 Qwen2-1.5B,嵌入是绑定在一起的。
为了验证这一点,我们运行了以下脚本:
输出结果如下:
这意味着
lm_head
的权重与model.embed_tokens
的权重相等。