问题描述
我尝试使用LangChain API进行聊天,但是遇到了InferenceTimeoutError错误。我已经尝试了以下步骤:
- 为这个问题添加了一个非常详细的标题。
- 使用集成搜索在LangChain文档中查找相关信息。
- 使用GitHub搜索找到了一个类似的问题,但没有找到答案。
- 确认这是一个LangChain的bug,而不是我的代码问题。
- 更新到最新版本的LangChain或特定集成包,但问题仍然存在。
示例代码
from typing import List
from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
class Calculator(BaseModel):
"""Multiply two integers together."""
a: int = Field(..., description="First integer")
b: int = Field(..., description="Second integer")
llm = HuggingFaceEndpoint(
repo_id="HuggingFaceH4/zephyr-7b-beta",
task="text-generation",
max_new_tokens=512,
do_sample=False,
repetition_penalty=1.03,
)
chat_model = ChatHuggingFace(llm=llm, verbose=True).with_structured_output(schema=Calculator, include_raw=True)
print(chat_model.invoke("How much is 3 multiplied by 12?"))
错误信息和堆栈跟踪(如果适用)
{
"name": "InferenceTimeoutError",
"message": "Model not loaded on the server: https://api-inference.huggingface.co/models/HuggingFaceH4/zephyr-7b-beta/v1/chat/completions . Please retry with a higher timeout (current: 120).",
"stack": "---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
File /usr/local/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py:304, in hf_raise_for_status(response, endpoint_name)
303 try:
--> 304 response.raise_for_status()
305 except HTTPError as e:
File /usr/local/lib/python3.11/site-packages/requests/models.py:1021, in Response.raise_for_status(self)
1020 if http_error_msg:
-> 1021 raise HTTPError(http_error_msg, response=self)
HTTPError: 503 Server Error: Service Unavailable for url: https://api-inference.huggingface.co/models/HuggingFaceH4/zephyr-7b-beta/v1/chat/completions (Request ID: ORVZdEUt8LFrnogCctYgS)
Model HuggingFaceH4/zephyr-7b-beta is currently loading
The above exception was the direct cause of the following exception:
InferenceTimeoutError Traceback (most recent call last)
Cell In[1], line 27
17 llm = HuggingFaceEndpoint(
18 repo_id="HuggingFaceH4/zephyr-7b-beta",
(...)
23 repetition_penalty=1.03,
24 )
26 chat_model = ChatHuggingFace(llm=llm,verbose=True,).with_structured_output(schema=Calculator, include_raw=True)
---> 27 print(chat_model.invoke("How much is 3 multiplied by 12?"))
File /usr/local/lib/python3.11/site-packages/langchain_core/runnables/base.py:2503, in RunnableSequence.invoke(self, input, config)
2499 config = patch_config(
2500 config, callbacks=run_manager.get_child(f"seq:step:{i+1}")
2501 )
if i == 0:
-> 2503 input = step.invoke(input, config)
else:
-> 2504 input = step.invoke(input, config)
File /usr/local/lib/python3.11/site-packages/langchain_core/runnables/base.py:3150, in RunnableParallel.invoke(self, input, config)
3137 with get_executor_for_config(config) as executor:
3138 futures = [
3139 executor.submit(
3140 step.invoke,
(...)
3148 for key, step in steps.items()
3149 ]
-> 3150 output = {key: future.result() for key, future in zip(steps, futures)}
finish the root run
3151 except BaseException as e:
File /usr/local/lib/python3.11/concurrent/futures/_base.py:456, in Future.result(self, timeout)
454 raise CancelledError()
elif self._state == FINISHED:
--> 456 return self.__get_result()
else:
raise TimeoutError()
File /usr/local/lib/python3.11/concurrent/futures/thread.py:58, in _WorkItem.run(self)
55 return
---> 58 result = self.fn(*self.args, **self.kwargs)
File /usr/local/lib/python3.11/site-packages/langchain_core/runnables/base.py:4586, in RunnableBindingBase.invoke(self, input, config)
4580 def invoke(
4581 self,
4582 input: Input,
4583 config: Optional[RunnableConfig] = None,
4584 **kwargs: Optional[Any],
代码中出现了一个问题,导致在使用HuggingFaceEndpoint时,with_structured_output方法无法正常工作。这个问题是由于模型加载超时引起的,从而无法有效使用模型。尝试将超时时间提高到300秒,但仍然无法解决问题。
系统信息
系统信息
操作系统:Linux
操作系统版本:#1 SMP Fri Mar 29 23:14:13 UTC 2024
Python版本:3.11.7 (main, Dec 19 2023, 20:33:49) [GCC 12.2.0]
软件包信息
langchain_core: 0.2.11
langchain: 0.2.6
langchain_community: 0.2.6
langsmith: 0.1.83
langchain_experimental: 0.0.62
langchain_huggingface: 0.0.3
langchain_milvus: 0.1.1
langchain_text_splitters: 0.2.2
没有安装的软件包(不一定是个问题)
以下软件包未找到:
langgraph
langserve
2条答案
按热度按时间cygmwpex1#
我正在经历同样的问题。
sz81bmfz2#
在对HuggingFace客户端文档进行直接测试后,我发现使用官方文档提供的方法没有问题。但是,当我添加了工具设置并将LangChain的post请求发送到chat_completion方法时,出现了503错误。后来,我发现问题出现在设置tool_choice参数时。原本,它被传递为:
{ "tool_choice": { "type": "function", "function": { "name": "any" } } }
应该根据官方教程设置为相应的方法名:
{ "tool_choice": { "type": "function", "function": { "name": "joke" } } }
或者,如果你使用默认值,它将是:
structured_llm.invoke("Tell me a joke about cats", tool_choice="auto")
这样,它将正常返回结果。