问题描述
按照所给流程执行后,出现了界面,但是提问时出现“An error occurred during streaming”
复现问题的步骤
- 在输入框中输入“你好”。
- 出现错误“An error occurred during streaming”。
预期的结果
应该输出对“你好”的回复。
实际结果
输出错误“An error occurred during streaming”。
环境信息
- langchain-ChatGLM 版本/commit 号:3.1.2
- 是否使用 Docker 部署(是/否):否
- 使用的模型:本地模型ChatGLM3-6B
- 使用的 Embedding 模型:bge-large-zh
- 使用的向量库类型 : faiss
- 操作系统及版本 :linux
- Python 版本:3.11
附加信息
日志文件中为:| ERROR | chatchat.server.api_server.openai_routes:generator:105 - openai request error: An error occurred during streaming
5条答案
按热度按时间whlutmcx1#
在本地知识库提问时出现错误:| ERROR | chatchat.server.utils:wrap_done:46 - APIError: 在流传输过程中捕获到异常:发生错误
o2gm4chl2#
你把 basic_settings.yaml 里面的 log_verbose 设为 true,看看更详细的错误信息
rpppsulh3#
2024-07-22 01:00:58.319 | INFO | chatchat.startup:run_api_server:55 - Api MODEL_PLATFORMS: [PlatformConfig(platform_name='xinference', platform_type='xinference', api_base_url='http://127.0.0.1:9997/v1', api_key='EMPTY', api_proxy='', api_concurrencies=5, auto_detect_model=True, llm_models=['chatglm3_6b'], embed_models=['bge_large_zh'], text2image_models=[], image2text_models=[], rerank_models=[], speech2text_models=[], text2speech_models=[]), PlatformConfig(platform_name='ollama', platform_type='ollama', api_base_url='http://127.0.0.1:11434/v1', api_key='EMPTY', api_proxy='', api_concurrencies=5, auto_detect_model=False, llm_models=['qwen:7b', 'qwen2:7b'], embed_models=['quentinz/bge-large-zh-v1.5'], text2image_models=[], image2text_models=[], rerank_models=[], speech2text_models=[], text2speech_models=[]), PlatformConfig(platform_name='oneapi', platform_type='oneapi', api_base_url='http://127.0.0.1:3000/v1', api_key='sk-', api_proxy='', api_concurrencies=5, auto_detect_model=False, llm_models=['chatglm_pro', 'chatglm_turbo', 'chatglm_std', 'chatglm_lite', 'qwen-turbo', 'qwen-plus', 'qwen-max', 'qwen-max-longcontext', 'ERNIE-Bot', 'ERNIE-Bot-turbo', 'ERNIE-Bot-4', 'SparkDesk'], embed_models=['text-embedding-v1', 'Embedding-V1'], text2image_models=[], image2text_models=[], rerank_models=[], speech2text_models=[], text2speech_models=[]), PlatformConfig(platform_name='openai', platform_type='openai', api_base_url='https://api.openai.com/v1', api_key='sk-proj-', api_proxy='', api_concurrencies=5, auto_detect_model=False, llm_models=['gpt-4o', 'gpt-3.5-turbo'], embed_models=['text-embedding-3-small', 'text-embedding-3-large'], text2image_models=[], image2text_models=[], rerank_models=[], speech2text_models=[], text2speech_models=[])]
2024-07-22 01:01:54.689 | ERROR | chatchat.server.api_server.openai_routes:generator:105 - openai request error: An error occurred during streaming
2024-07-22 00:57:52,416 transformers.configuration_utils 242110 INFO loading configuration file /home/lrn/ALL/Langchain-Chatchat/models/chatglm3-6b/config.json
2024-07-22 00:57:52,417 transformers.configuration_utils 242110 INFO Model config ChatGLMConfig {
"_name_or_path": "/home/lrn/ALL/Langchain-Chatchat/models/chatglm3-6b",
"add_bias_linear": false,
"add_qkv_bias": true,
"apply_query_key_layer_scaling": true,
"apply_residual_connection_post_layernorm": false,
"architectures": [
"ChatGLMModel"
],
"attention_dropout": 0.0,
"attention_softmax_in_fp32": true,
"auto_map": {
"AutoConfig": "configuration_chatglm.ChatGLMConfig",
"AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceClassification"
},
"bias_dropout_fusion": true,
"classifier_dropout": null,
"eos_token_id": 2,
"ffn_hidden_size": 13696,
"fp32_residual_connection": false,
"hidden_dropout": 0.0,
"hidden_size": 4096,
"kv_channels": 128,
"layernorm_epsilon": 1e-05,
"model_type": "chatglm",
"multi_query_attention": true,
"multi_query_group_num": 2,
"num_attention_heads": 32,
"num_layers": 28,
"original_rope": true,
"pad_token_id": 0,
"padded_vocab_size": 65024,
"post_layer_norm": true,
"pre_seq_len": null,
"prefix_projection": false,
"quantization_bit": 0,
"rmsnorm": true,
"seq_length": 8192,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.42.4",
"use_cache": true,
"vocab_size": 65024
}
2024-07-22 00:58:16,916 transformers.modeling_utils 242537 INFO All the weights of BertModel were initialized from the model checkpoint at /home/lrn/ALL/Langchain-Chatchat/models/bge-large-zh. If your task is similar to the task the model of the checkpoint was trained on, you can already use BertModel for predictions without further training.
2024-07-22 00:58:16,916 transformers.dynamic_module_utils 242537 INFO Patched resolved_trust_remote_code: (False, '/home/lrn/ALL/Langchain-Chatchat/models/bge-large-zh', True, False) {}
2024-07-22 00:58:16,916 transformers.tokenization_utils_base 242537 INFO loading file vocab.txt
2024-07-22 00:58:16,916 transformers.tokenization_utils_base 242537 INFO loading file tokenizer.json
2024-07-22 00:58:16,916 transformers.tokenization_utils_base 242537 INFO loading file added_tokens.json
2024-07-22 00:58:16,916 transformers.tokenization_utils_base 242537 INFO loading file special_tokens_map.json
2024-07-22 00:58:16,916 transformers.tokenization_utils_base 242537 INFO loading file tokenizer_config.json
--- Logging error ---
Traceback (most recent call last):
File "/home/lrn/anaconda3/envs/xinference/lib/python3.11/logging/handlers.py", line 73, in emit
if self.shouldRollover(record):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lrn/anaconda3/envs/xinference/lib/python3.11/logging/handlers.py", line 196, in shouldRollover
msg = "%s\n" % self.format(record)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lrn/anaconda3/envs/xinference/lib/python3.11/logging/init.py", line 953, in format
return fmt.format(record)
^^^^^^^^^^^^^^^^^^^^^^^^^^{
File "/home/lrn/anaconda3/envs/xinference/lib/python3.11/logging/init.py", line 687, in format
record.message = record.getMessage()
TypeError: not all arguments converted during string formatting
Call stack:
File "/home/lrn/anaconda3/envs/xinference/lib/python3.11/threading.py", line 995, in _bootstrap
self._inner_bootstrap()
File "/home/lrn/anaconda3/envs/xinference/lib/python3.11/concurrent/futures/thread.py", line 83, in _work_guard
work_item.run()
File "/home/lrn/anaconda3/envs/xinference/lib/python3.11/site-packages/xoscar/api.py", line 402, in _wrapper
return next(_gen)
File "/home/lrn/anaconda3/envs/xinference/lib/python3.11/site-packages/xinference/core/model.py", line 318, in _to_json_generator
for v in gen:
File "/home/lrn/anaconda3/envs/xinference/lib/python3.11/site-packages/xinference/model/llm/utils.py, in _to_chat_completion_chunks
for i, chunk in enumerate(chunks):
File "/home/lrn/anaconda3/envs/xinference/lib/python3.11/site-packages/xinference/model//llm//pytorch//chatglm.py", line 259, in _stream_generator
for chunk_text, _ in self.model.stream_chat(self.input_ids, past=past_key_values)
File "/home/lrn
这个错误是由于在调用
GenerationMixin._get_logits_warper()
方法时缺少了一个必需的位置参数device
。为了解决这个问题,你需要在调用该方法时传入device
参数。解决方案:
在调用
GenerationMixin._get_logits_warper()
方法时,传入device
参数。例如:cetgtptt4#
我更换了其他的本地模型:glm-4v-9b,还是不能正常的提问
qnyhuwrf5#
这是模型加载的问题,你可以尝试换用 qwen 或 ollama。