问题描述

我尝试使用LangChain API进行聊天，但是遇到了InferenceTimeoutError错误。我已经尝试了以下步骤：

为这个问题添加了一个非常详细的标题。
使用集成搜索在LangChain文档中查找相关信息。
使用GitHub搜索找到了一个类似的问题，但没有找到答案。
确认这是一个LangChain的bug,而不是我的代码问题。
更新到最新版本的LangChain或特定集成包，但问题仍然存在。

示例代码

from typing import List
from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint

class Calculator(BaseModel):
"""Multiply two integers together."""
a: int = Field(..., description="First integer")
b: int = Field(..., description="Second integer")

llm = HuggingFaceEndpoint(
repo_id="HuggingFaceH4/zephyr-7b-beta",
task="text-generation",
max_new_tokens=512,
do_sample=False,
repetition_penalty=1.03,
)

chat_model = ChatHuggingFace(llm=llm, verbose=True).with_structured_output(schema=Calculator, include_raw=True)
print(chat_model.invoke("How much is 3 multiplied by 12?"))

错误信息和堆栈跟踪(如果适用)

{
"name": "InferenceTimeoutError",
"message": "Model not loaded on the server: https://api-inference.huggingface.co/models/HuggingFaceH4/zephyr-7b-beta/v1/chat/completions . Please retry with a higher timeout (current: 120).",
"stack": "---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
File /usr/local/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py:304, in hf_raise_for_status(response, endpoint_name)
303 try:
--> 304 response.raise_for_status()
305 except HTTPError as e:
File /usr/local/lib/python3.11/site-packages/requests/models.py:1021, in Response.raise_for_status(self)
1020 if http_error_msg:
-> 1021 raise HTTPError(http_error_msg, response=self)
HTTPError: 503 Server Error: Service Unavailable for url: https://api-inference.huggingface.co/models/HuggingFaceH4/zephyr-7b-beta/v1/chat/completions (Request ID: ORVZdEUt8LFrnogCctYgS)
Model HuggingFaceH4/zephyr-7b-beta is currently loading
The above exception was the direct cause of the following exception:
InferenceTimeoutError Traceback (most recent call last)
Cell In[1], line 27
17 llm = HuggingFaceEndpoint(
18 repo_id="HuggingFaceH4/zephyr-7b-beta",
(...)
23 repetition_penalty=1.03,
24 )
26 chat_model = ChatHuggingFace(llm=llm,verbose=True,).with_structured_output(schema=Calculator, include_raw=True)
---> 27 print(chat_model.invoke("How much is 3 multiplied by 12?"))
File /usr/local/lib/python3.11/site-packages/langchain_core/runnables/base.py:2503, in RunnableSequence.invoke(self, input, config)
2499 config = patch_config(
2500 config, callbacks=run_manager.get_child(f"seq:step:{i+1}")
2501 )
if i == 0:
-> 2503 input = step.invoke(input, config)
else:
-> 2504 input = step.invoke(input, config)
File /usr/local/lib/python3.11/site-packages/langchain_core/runnables/base.py:3150, in RunnableParallel.invoke(self, input, config)
3137 with get_executor_for_config(config) as executor:
3138 futures = [
3139 executor.submit(
3140 step.invoke,
(...)
3148 for key, step in steps.items()
3149 ]
-> 3150 output = {key: future.result() for key, future in zip(steps, futures)}

finish the root run

3151 except BaseException as e:
File /usr/local/lib/python3.11/concurrent/futures/_base.py:456, in Future.result(self, timeout)
454 raise CancelledError()
elif self._state == FINISHED:
--> 456 return self.__get_result()
else:
raise TimeoutError()
File /usr/local/lib/python3.11/concurrent/futures/thread.py:58, in _WorkItem.run(self)
55 return
---> 58 result = self.fn(*self.args, **self.kwargs)
File /usr/local/lib/python3.11/site-packages/langchain_core/runnables/base.py:4586, in RunnableBindingBase.invoke(self, input, config)
4580 def invoke(
4581 self,
4582 input: Input,
4583 config: Optional[RunnableConfig] = None,
4584 **kwargs: Optional[Any],
代码中出现了一个问题，导致在使用HuggingFaceEndpoint时，with_structured_output方法无法正常工作。这个问题是由于模型加载超时引起的，从而无法有效使用模型。尝试将超时时间提高到300秒，但仍然无法解决问题。

系统信息

操作系统：Linux
操作系统版本：#1 SMP Fri Mar 29 23:14:13 UTC 2024
Python版本：3.11.7 (main, Dec 19 2023, 20:33:49) [GCC 12.2.0]

软件包信息

langchain_core: 0.2.11
langchain: 0.2.6
langchain_community: 0.2.6
langsmith: 0.1.83
langchain_experimental: 0.0.62
langchain_huggingface: 0.0.3
langchain_milvus: 0.1.1
langchain_text_splitters: 0.2.2

没有安装的软件包(不一定是个问题)

以下软件包未找到：
langgraph
langserve

2条答案

按热度按时间

cygmwpex1#

我正在经历同样的问题。

赞(0）回复(0）举报 6个月前

sz81bmfz2#

在对HuggingFace客户端文档进行直接测试后，我发现使用官方文档提供的方法没有问题。但是，当我添加了工具设置并将LangChain的post请求发送到chat_completion方法时，出现了503错误。后来，我发现问题出现在设置tool_choice参数时。原本，它被传递为：
{ "tool_choice": { "type": "function", "function": { "name": "any" } } }
应该根据官方教程设置为相应的方法名：
{ "tool_choice": { "type": "function", "function": { "name": "joke" } } }
或者，如果你使用默认值，它将是：
structured_llm.invoke("Tell me a joke about cats", tool_choice="auto")
这样，它将正常返回结果。

langchain ``` HuggingFaceEndpoint无法使用with_structured_output,因为模型加载超时, ```

问题描述

示例代码

错误信息和堆栈跟踪(如果适用)

finish the root run

系统信息

系统信息

软件包信息

没有安装的软件包(不一定是个问题)

2条答案

相关问题

热门标签

最新问答