inference 使用最新的GLM-4聊天9b模型进行推理失败,

bsxbgnwa  于 3个月前  发布在  其他
关注(0)|答案(2)|浏览(244)

系统信息 / 系统信息

python --version
Python 3.11.0

是否使用 Docker 运行 Xinference? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

版本信息 / 版本信息

xinference -v
xinference, version 0.13.1

用以启动 xinference 的命令

XINFERENCE_MODEL_SRC=modelscope xinference-local --host 0.0.0.0 --port 9997

复现过程

启动模型:

xinference launch --model-engine  [http://0.0.0.0:9997](http://0.0.0.0:9997)  -n glm4-chat -s 9 -f pytorch -q none -en transformers

然后做推理:

xinference 服务端详细日志

2024-07-17 10:36:26,322 transformers.configuration_utils 235951 INFO loading configuration file /home/jason/.xinference/cache/glm4-chat-pytorch-9b/config.json
2024-07-17 10:36:26,323 transformers.configuration_utils 235951 INFO loading configuration file /home/jason/.xinference/cache/glm4-chat-pytorch-9b/config.json
2024-07-17 10:36:26,323 transformers.configuration_utils 235951 INFO Model config ChatGLMConfig {
"_name_or_path": "/home/jason/.xinference/cache/glm4-chat-pytorch-9b",
"add_bias_linear": false,
"add_qkv_bias": true,
"apply_query_key_layer_scaling": true,
"apply_residual_connection_post_layernorm": false,
"architectures": [
"ChatGLMModel"
],
"attention_dropout": 0.0,
"attention_softmax_in_fp32": true,
"auto_map": {
"AutoConfig": "configuration_chatglm.ChatGLMConfig",
"AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification"
},
"bias_dropout_fusion": true,
"classifier_dropout": null,
"eos_token_id": [
151329,
151336,
151338
],
"ffn_hidden_size": 13696,
"fp32_residual_connection": false,
"hidden_dropout": 0.0,
"hidden_size": 4096,
"kv_channels": 128,
"layernorm_epsilon": 1.5625e-07,
"model_type": "chatglm",
"multi_query_attention": true,
"multi_query_group_num": 2,
"num_attention_heads": 32,
"num_hidden_layers": 40,
"num_layers": 40,
"original_rope": true,
"pad_token_id": 151329,
"padded_vocab_size": 151552,
"post_layer_norm": true,
"rmsnorm": true,
"rope_ratio": 500,
"seq_length": 131072,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.42.4",
"use_cache": true,
"vocab_size": 151552
}

2024-07-17 10:36:26,688 transformers.modeling_utils 235951 INFO loading weights file /home/jason/.xinference/cache/glm4-chat-pytorch-9b/model.safetensors.index.json
2024-07-17 10:36:26,688 transformers.modeling_utils 235951 INFO Instantiating ChatGLMForConditionalGeneration model under default dtype torch.float32.
2024-07-17 10:36:26,689 transformers.generation.configuration_utils 235951 INFO Generate config GenerationConfig {
"eos_token_id": [
151329,
151336,
151338
],
"pad_token_id": 151329
}

Loading checkpoint shards: 100%|██████████████████████| 10/10 [00:02<00:00, 4.06it/s]
2024-07-17 10:36:29,209 transformers.modeling_utils 235951 INFO All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

2024-07-17 10:36:29,209 transformers.modeling_utils 235951 INFO All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /home/jason/.xinference/cache/glm4-chat-pytorch-9b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training.
2024-07-17 10:36:29,211 transformers.generation.configuration_utils 235951 INFO loading configuration file /home/jason/.xinference/cache/glm4-chat-pytorch-9b/generation_config.json
2024-07-17 10:36:29,212 transformers.generation.configuration_utils 235951 INFO Generate config GenerationConfig {
"do_sample": true,
"eos_token_id": [
151329,
151336,
151338
],
"max_length": 128000,
"pad_token_id": 151329,
"temperature": 0.8,
"top_p": 0.8
}

2024-07-17 10:37:14,866 xinference.api.restful_api 233925 ERROR Chat completion stream got an error: [address=0.0.0.0:28743, pid=235951] 'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'
Traceback (most recent call last):
File "/home/jason/.conda/envs/conda-env-for-xinference/lib/python3.11/site-packages/xinference/api/restful_api.py", line 1584, in stream_results
async for item in iterator:
File "/home/jason/.conda/envs/conda-env-for-xinference/lib/python3.11/site-packages/xoscar/api.py", line 340, in anext
return await self._actor_ref.xoscar_next(self._uid)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jason/.conda/envs/conda-env-for-xinference/lib/python3.11/site-packages/xoscar/backends/context.py", line 231, in send
return self._process_result_message(result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jason/.conda/envs/conda-env-for-xinference/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/home/jason/.conda/envs/conda-env-for-xinference/lib/python3.11/site-packages/xoscar/backends/pool.py", line 656, in send
result = await self._run_coro(message.message_id, coro)
^^^^^^^^^^^^^^^^^
File "/home/jason/.conda/envs/conda-env-for-xinference/lib/python3.11/site-packages/xoscar/backends/pool.py", line 367, in _run_coro
return await coro
File "/home/jason/.conda/envs/conda-env-for-xinference/lib/python3.11/site-packages/xoscar/api.py", line 384, in on_receive
return await super().on_receive(message) # type: ignore
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 558, in on_receive
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive
async with self._lock:
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive
with debug_async_timeout('actor_lock_timeout',
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive
result = await result
^^^^^^^^^^^^^^^^^
File "/home/jason/.conda/envs/conda-env-for-xinference/lib/python3.11/site-packages/xoscar/api.py", line 431, in xoscar_next
raise e
File "/home/jason/.conda/envs/conda-env-for-xinference/lib/python3.11/site-packages/xoscar/api.py", line 417, in xoscar_next
r = await asyncio.to_thread(_wrapper, gen)
^^^^^^^^^^^^^^^^^
File "/home/jason/.conda/envs/conda-env-for-xinference/lib/python3.11/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^
File "/home/jason/.conda/envs/conda-env-for-xinference/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^
File "/home/jason/.conda/envs/conda-env-for-xinference/lib/python3.11/site-packages/xoscar/api.py", line 402, in _wrapper
return next(_gen)
File "/home/jason/.conda/envs/conda-env-for-xinference/lib/python3.11/site-packages/xinference/core/model.py", line 318, in _to_json_generator
for v in gen:
File "/home/jason/.conda/envs/conda-env-for-xinference/lib/python3.11/site-packages/xinference/model/llm/utils.py", line 558, in _to_chat_completion_chunks
for i, chunk in enumerate(chunks):
^^^^^^^^^^^^^^^^^
File "/home/jason/.conda/envs/conda-env-for-xinference/lib/python3.11/site-packages/xinference/model/llm/pytorch/chatglm.py", line 259, in _stream_generator
for chunk_text, _ in self._model.stream_chat(
^^^^^^^^^^^^^^^^^
File "/home/jason/.conda/envs/conda-env-for-xinference/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1709, in getattr
raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
^^^^^^^^^^^^^^^^^
AttributeError: [address=0.0.0.0:28743, pid=235951] 'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'

期待表现 / 期待表现

希望能正常推理,不会报错

axzmvihb

axzmvihb1#

我也是这个问题,xinference部署出现问题,请问解决了吗?

xu3bshqb

xu3bshqb2#

看起来他们删除了"stream_chat"函数。
https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat/files/59a0d59f0befb468b895fcd204f4fd1f99c68fd6#diff_view_modeling_chatglm.py

相关问题