系统信息
这是从github上安装的当前版本,text-generation-launcher 2.1.1-dev0
信息
- Docker
- 直接使用CLI
任务
- 一个官方支持的命令
- 我自己的修改
重现过程
- 克隆当前的github仓库,安装
- 运行
text-generation-launcher --max-input-tokens 1024 --max-total-tokens 2048 --max-batch-size 12 -p 3409 --model-id google/gemma-2-9b --master_port 29501
- 它会报错,错误信息如下:
File "/rscratch/nehrdich/miniconda3/envs/lm2/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/rscratch/nehrdich/miniconda3/envs/lm2/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/rscratch/nehrdich/miniconda3/envs/lm2/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/rscratch/nehrdich/miniconda3/envs/lm2/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/server.py", line 231, in serve_inner
model = get_model(
File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/models/__init__.py", line 645, in get_model
return FlashGemma2(
File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/models/flash_gemma2.py", line 69, in __init__
model = FlashGemma2ForCausalLM(prefix, config, weights, causal=True)
File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/models/custom_modeling/flash_gemma2_modeling.py", line 454, in __init__
self.embed_tokens = TensorParallelEmbedding(
File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/layers/tensor_parallel.py", line 230, in __init__
weight = weights.get_partial_sharded(f"{prefix}.weight", dim=0)
File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/utils/weights.py", line 89, in get_partial_sharded
filename, tensor_name = self.get_filename(tensor_name)
File "/rscratch/nehrdich/tgi/text-generation-inference/server/text_generation_server/utils/weights.py", line 64, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight model.embed_tokens.weight does not exist
2024-07-02T07:21:44.507020Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
预期行为
模型应该在没有错误信息的情况下启动。
1条答案
按热度按时间bvpmtnay1#
你好@sebastian-nehrdich 👋
感谢你报告这个问题。在日志消息
Shard complete standard error output
之后,是否有来自记录为shard的退出代码?