从Azure AI端点加载LLM(混合8x22B)作为Langchain模型 - BaseMessage,而不是AIMessage,

brvekthn  于 4个月前  发布在  其他
关注(0)|答案(6)|浏览(80)

检查其他资源

  • 我为这个问题添加了一个非常描述性的标题。
  • 我在LangChain文档中使用集成搜索进行搜索。
  • 我使用GitHub搜索找到了一个类似的问题,但没有找到。
  • 我确信这是LangChain中的一个bug,而不是我的代码。
  • 通过更新到LangChain的最新稳定版本(或特定集成包)无法解决此错误。

示例代码

from langchain_community.chat_models.azureml_endpoint import AzureMLChatOnlineEndpoint
from langchain_community.llms.azureml_endpoint import ContentFormatterBase
from langchain_community.chat_models.azureml_endpoint import (
    AzureMLEndpointApiType,
    CustomOpenAIChatContentFormatter,
)
from langchain_core.messages import HumanMessage

chat = AzureMLChatOnlineEndpoint(
    endpoint_url="https://llm-host-westeurope-mx8x22bi.westeurope.inference.ml.azure.com/score",
    endpoint_api_type=AzureMLEndpointApiType.dedicated,
    endpoint_api_key="xY1BWYshxYJhQGZE6P7Uc1of34BW9b5t",
    content_formatter=CustomOpenAIChatContentFormatter(),
)
response = chat.invoke(
    [HumanMessage(content="Hallo")],max_tokens=512
)
response

错误消息和堆栈跟踪(如果适用)

我认为我已经设置了正确的部署类型。请查看完整的跟踪信息:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File [~/anaconda3/lib/python3.11/site-packages/langchain_community/chat_models/azureml_endpoint.py:140](https://file+.vscode-resource.vscode-cdn.net/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/~/anaconda3/lib/python3.11/site-packages/langchain_community/chat_models/azureml_endpoint.py:140), in CustomOpenAIChatContentFormatter.format_response_payload(self, output, api_type)
    139 try:
--> 140     choice = json.loads(output)["output"]
    141 except (KeyError, IndexError, TypeError) as e:

KeyError: 'output'

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
[/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/test.ipynb](https://file+.vscode-resource.vscode-cdn.net/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/test.ipynb) Zelle 4 line 8
      [5](vscode-notebook-cell:/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/test.ipynb#Y133sZmlsZQ%3D%3D?line=4) prompt = ChatPromptTemplate.from_messages([("system", system), ("human", human)])
      [7](vscode-notebook-cell:/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/test.ipynb#Y133sZmlsZQ%3D%3D?line=6) chain = prompt | chat
----> [8](vscode-notebook-cell:/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/test.ipynb#Y133sZmlsZQ%3D%3D?line=7) chain.invoke({"text": "Explain the importance of low latency for LLMs."})

File [~/anaconda3/lib/python3.11/site-packages/langchain_core/runnables/base.py:2507](https://file+.vscode-resource.vscode-cdn.net/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/~/anaconda3/lib/python3.11/site-packages/langchain_core/runnables/base.py:2507), in RunnableSequence.invoke(self, input, config, **kwargs)
   2505             input = step.invoke(input, config, **kwargs)
   2506         else:
-> 2507             input = step.invoke(input, config)
   2508 # finish the root run
   2509 except BaseException as e:

File [~/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py:248](https://file+.vscode-resource.vscode-cdn.net/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/~/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py:248), in BaseChatModel.invoke(self, input, config, stop, **kwargs)
    237 def invoke(
    238     self,
    239     input: LanguageModelInput,
   (...)
    243     **kwargs: Any,
    244 ) -> BaseMessage:
    245     config = ensure_config(config)
    246     return cast(
    247         ChatGeneration,
--> 248         self.generate_prompt(
    249             [self._convert_input(input)],
    250             stop=stop,
    251             callbacks=config.get("callbacks"),
    252             tags=config.get("tags"),
    253             metadata=config.get("metadata"),
    254             run_name=config.get("run_name"),
    255             run_id=config.pop("run_id", None),
    256             **kwargs,
    257         ).generations[0][0],
    258     ).message

File [~/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py:677](https://file+.vscode-resource.vscode-cdn.net/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/~/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py:677), in BaseChatModel.generate_prompt(self, prompts, stop, callbacks, **kwargs)
    669 def generate_prompt(
    670     self,
    671     prompts: List[PromptValue],
   (...)
    674     **kwargs: Any,
    675 ) -> LLMResult:
    676     prompt_messages = [p.to_messages() for p in prompts]
--> 677     return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)

File [~/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py:534](https://file+.vscode-resource.vscode-cdn.net/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/~/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py:534), in BaseChatModel.generate(self, messages, stop, callbacks, tags, metadata, run_name, run_id, **kwargs)
    532         if run_managers:
    533             run_managers[i].on_llm_error(e, response=LLMResult(generations=[]))
--> 534         raise e
    535 flattened_outputs = [
    536     LLMResult(generations=[res.generations], llm_output=res.llm_output)  # type: ignore[list-item]
    537     for res in results
    538 ]
    539 llm_output = self._combine_llm_outputs([res.llm_output for res in results])

File [~/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py:524](https://file+.vscode-resource.vscode-cdn.net/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/~/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py:524), in BaseChatModel.generate(self, messages, stop, callbacks, tags, metadata, run_name, run_id, **kwargs)
    521 for i, m in enumerate(messages):
    522     try:
    523         results.append(
--> 524             self._generate_with_cache(
    525                 m,
    526                 stop=stop,
    527                 run_manager=run_managers[i] if run_managers else None,
    528                 **kwargs,
    529             )
    530         )
    531     except BaseException as e:
    532         if run_managers:

File [~/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py:749](https://file+.vscode-resource.vscode-cdn.net/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/~/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py:749), in BaseChatModel._generate_with_cache(self, messages, stop, run_manager, **kwargs)
    747 else:
    748     if inspect.signature(self._generate).parameters.get("run_manager"):
--> 749         result = self._generate(
    750             messages, stop=stop, run_manager=run_manager, **kwargs
    751         )
    752     else:
    753         result = self._generate(messages, stop=stop, **kwargs)

File [~/anaconda3/lib/python3.11/site-packages/langchain_community/chat_models/azureml_endpoint.py:279](https://file+.vscode-resource.vscode-cdn.net/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/~/anaconda3/lib/python3.11/site-packages/langchain_community/chat_models/azureml_endpoint.py:279), in AzureMLChatOnlineEndpoint._generate(self, messages, stop, run_manager, **kwargs)
    273 request_payload = self.content_formatter.format_messages_request_payload(
    274     messages, _model_kwargs, self.endpoint_api_type
    275 )
    276 response_payload = self.http_client.call(
    277     body=request_payload, run_manager=run_manager
    278 )
--> 279 generations = self.content_formatter.format_response_payload(
    280     response_payload, self.endpoint_api_type
    281 )
    282 return ChatResult(generations=[generations])

File [~/anaconda3/lib/python3.11/site-packages/langchain_community/chat_models/azureml_endpoint.py:142](https://file+.vscode-resource.vscode-cdn.net/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/~/anaconda3/lib/python3.11/site-packages/langchain_community/chat_models/azureml_endpoint.py:142), in CustomOpenAIChatContentFormatter.format_response_payload(self, output, api_type)
    140         choice = json.loads(output)["output"]
    141     except (KeyError, IndexError, TypeError) as e:
--> 142         raise ValueError(self.format_error_msg.format(api_type=api_type)) from e
    143     return ChatGeneration(
    144         message=BaseMessage(
    145             content=choice.strip(),
   (...)
    148         generation_info=None,
    149     )
    150 if api_type == AzureMLEndpointApiType.serverless:

ValueError: Error while formatting response payload for chat model of type  `AzureMLEndpointApiType.dedicated`. Are you using the right formatter for the deployed  model and endpoint type?

描述

你好,
我在Azure AI/机器学习上设置了Mixtral 8x22B,现在想将其与Langchain一起使用。我遇到了格式方面的问题,例如ChatOpenAI的响应如下所示:

from langchain_openai import ChatOpenAI
llmm = ChatOpenAI()
llmm.invoke("Hallo")

AIMessage(content='Hallo! Wie kann ich Ihnen helfen?', response_metadata={'token_usage': {'completion_tokens': 8, 'prompt_tokens': 8, 'total_tokens': 16}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='r')
当我使用AzureMLChatOnlineEndpoint加载Mixtral 8x22B时,它看起来是这样的:

from langchain_community.chat_models.azureml_endpoint import AzureMLChatOnlineEndpoint

from langchain_community.chat_models.azureml_endpoint import (
    AzureMLEndpointApiType,
    CustomOpenAIChatContentFormatter,
)
from langchain_core.messages import HumanMessage

chat = AzureMLChatOnlineEndpoint(
    endpoint_url="...",
    endpoint_api_type=AzureMLEndpointApiType.dedicated,
    endpoint_api_key="...",
    content_formatter=CustomOpenAIChatContentFormatter(),
)

chat.invoke("Hallo")

BaseMessage(content='Hallo, ich bin ein deutscher Sprachassistent. Was kann ich für', type='assistant', id='run-23')
因此,在使用Mixtral模型时,输出**格式(BaseMessage vs. AIMessage)**不同。如何将其更改为像ChatOpenAI模型一样工作?
我进一步探索了在不成功的情况下是否可以在带有ChatPromptTemplate的链中工作:

from langchain_core.prompts import ChatPromptTemplate

system = "You are a helpful assistant called Bot."
human = "{text}"
prompt = ChatPromptTemplate.from_messages([("system", system), ("human", human)])

chain = prompt | chat
chain.invoke({"text": "Who are you?"})

这导致 KeyError: 'output'ValueError: Error while formatting response payload for chat model of type AzureMLEndpointApiType.dedicated . Are you using the right formatter for the deployed model and endpoint type?
请参阅上面的完整跟踪信息。
在我的应用程序中,我想轻松地在这两种模型之间切换。
提前感谢!

系统信息

langchain 0.2.6 pypi_0 pypi
langchain-chroma 0.1.0 pypi_0 pypi
langchain-community 0.2.6 pypi_0 pypi
langchain-core 0.2.10 pypi_0 pypi
langchain-experimental 0.0.49 pypi_0 pypi
langchain-groq 0.1.5 pypi_0 pypi
langchain-openai 0.1.7 pypi_0 pypi
langchain-postgres 0.0.3 pypi_0 pypi
langchain-text-splitters 0.2.1

iq3niunx

iq3niunx1#

嘿,@weissenbacherpwc,我打开了一个PR。
你能尝试我的分支并告诉我是否解决了问题吗?
pip install "git+https://github.com/langchain-ai/langchain.git@jacob/azure#subdirectory=libs/community"

ikfrs5lh

ikfrs5lh2#

你好,@jacoblee93。我尝试安装了你的分支。现在响应被返回为AIMessage而不是BaseMessage,但是在使用它在LCEL或LLMChain中时,出现了描述中的相同错误。
我尝试使用AzureMLOnlineEndpointAzureMLChatOnlineEndpoint进行测试,但没有成功。

uurv41yg

uurv41yg3#

看起来有一个导出的 MistralChatContentFormatter - 你能尝试示例化并传入那个吗?
https://api.python.langchain.com/en/latest/chat_models/langchain_community.chat_models.azureml_endpoint.MistralChatContentFormatter.html#
https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/chat_models/azureml_endpoint.py#L187

0s0u357o

0s0u357o4#

尝试了一下,谢谢!
然而它仍然无法正常工作,以下是代码:

from langchain_community.chat_models.azureml_endpoint import AzureMLChatOnlineEndpoint
from langchain_community.llms.azureml_endpoint import ContentFormatterBase
from langchain_community.chat_models.azureml_endpoint import (
    AzureMLEndpointApiType,
    CustomOpenAIChatContentFormatter,
    MistralChatContentFormatter
)
from langchain_core.messages import HumanMessage

chat = AzureMLChatOnlineEndpoint(
    endpoint_url="https://llm-host-westeurope-oqelx.westeurope.inference.ml.azure.com/score",
    endpoint_api_type=AzureMLEndpointApiType.dedicated,
    endpoint_api_key="",
    content_formatter=MistralChatContentFormatter(),
    #content_formatter=CustomOpenAIChatContentFormatter()
)
# prints UserWarning: `LlamaChatContentFormatter` will be deprecated in the future. 
                Please use `CustomOpenAIChatContentFormatter` instead.
response = chat.invoke(
    [HumanMessage(content="Hallo, whats your name?")],max_tokens=3000
)
response

在这里调用LLM时已经失败了,之前使用CustomOpenAIChatFormatter时是可以正常工作的:
ValueError: api_type AzureMLEndpointApiType.dedicated is not supported by this formatter

kt06eoxx

kt06eoxx5#

@jacoblee93 我可能找到了解决这个问题的方法。我在类 MistralChatContentFormatter(LlamaChatContentFormatter) (来自第187行) azureml_endpoint.py 中添加了这段代码:

elif api_type == AzureMLEndpointApiType.dedicated:
            request_payload = json.dumps(
                {
                    "input_data": {
                        "input_string": chat_messages,
                        "parameters": model_kwargs,
                    }
                }
            )

请查看完整的类:

class MistralChatContentFormatter(LlamaChatContentFormatter):
    """Content formatter for `Mistral`."""

    def format_messages_request_payload(
        self,
        messages: List[BaseMessage],
        model_kwargs: Dict,
        api_type: AzureMLEndpointApiType,
    ) -> bytes:
        """Formats the request according to the chosen api"""
        chat_messages = [self._convert_message_to_dict(message) for message in messages]

        if chat_messages and chat_messages[0]["role"] == "system":
            # Mistral OSS models do not explicitly support system prompts, so we have to
            # stash in the first user prompt
            chat_messages[1]["content"] = (
                chat_messages[0]["content"] + "\n\n" + chat_messages[1]["content"]
            )
            del chat_messages[0]

        if api_type == AzureMLEndpointApiType.realtime:
            request_payload = json.dumps(
                {
                    "input_data": {
                        "input_string": chat_messages,
                        "parameters": model_kwargs,
                    }
                }
            )
        elif api_type == AzureMLEndpointApiType.serverless:
            request_payload = json.dumps({"messages": chat_messages, **model_kwargs})
        elif api_type == AzureMLEndpointApiType.dedicated:
            request_payload = json.dumps(
                {
                    "input_data": {
                        "input_string": chat_messages,
                        "parameters": model_kwargs,
                    }
                }
            )
        else:
            raise ValueError(
                f"`api_type` {api_type} is not supported by this formatter"
            )
        return str.encode(request_payload)

有了这个,我可以在链中使用LLM并给LLM一个系统提示。

mutmk8jj

mutmk8jj6#

@jacoblee93 我可能找到了解决这个问题的方法。我在类 MistralChatContentFormatter(LlamaChatContentFormatter) (来自第187行) azureml_endpoint.py 中添加了这段代码:

elif api_type == AzureMLEndpointApiType.dedicated:
            request_payload = json.dumps(
                {
                    "input_data": {
                        "input_string": chat_messages,
                        "parameters": model_kwargs,
                    }
                }
            )

请查看完整的类:

class MistralChatContentFormatter(LlamaChatContentFormatter):
    """Content formatter for `Mistral`."""

    def format_messages_request_payload(
        self,
        messages: List[BaseMessage],
        model_kwargs: Dict,
        api_type: AzureMLEndpointApiType,
    ) -> bytes:
        """Formats the request according to the chosen api"""
        chat_messages = [self._convert_message_to_dict(message) for message in messages]

        if chat_messages and chat_messages[0]["role"] == "system":
            # Mistral OSS models do not explicitly support system prompts, so we have to
            # stash in the first user prompt
            chat_messages[1]["content"] = (
                chat_messages[0]["content"] + "\n\n" + chat_messages[1]["content"]
            )
            del chat_messages[0]

        if api_type == AzureMLEndpointApiType.realtime:
            request_payload = json.dumps(
                {
                    "input_data": {
                        "input_string": chat_messages,
                        "parameters": model_kwargs,
                    }
                }
            )
        elif api_type == AzureMLEndpointApiType.serverless:
            request_payload = json.dumps({"messages": chat_messages, **model_kwargs})
        elif api_type == AzureMLEndpointApiType.dedicated:
            request_payload = json.dumps(
                {
                    "input_data": {
                        "input_string": chat_messages,
                        "parameters": model_kwargs,
                    }
                }
            )
        else:
            raise ValueError(
                f"`api_type` {api_type} is not supported by this formatter"
            )
        return str.encode(request_payload)

有了这个,我可以在链中使用LLM并给LLM一个系统提示。
编辑:但是有了这个,Langchain中流式传输LLM输出不起作用:

chunks=[]
for chunk in llm.stream("hello. tell me something about yourself"):
    chunks.append(chunk)
    print(chunk.content, end="|", flush=True)

结果如下:

APIStatusError                            Traceback (most recent call last)
[/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/test.ipynb](https://file+.vscode-resource.vscode-cdn.net/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/test.ipynb) Zelle 16 line 2
      [1](vscode-notebook-cell:/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/test.ipynb#Y203sZmlsZQ%3D%3D?line=0) chunks=[]
----> [2](vscode-notebook-cell:/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/test.ipynb#Y203sZmlsZQ%3D%3D?line=1) for chunk in llm.stream("hello. tell me something about yourself"):
      [3](vscode-notebook-cell:/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/test.ipynb#Y203sZmlsZQ%3D%3D?line=2)     chunks.append(chunk)
      [4](vscode-notebook-cell:/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/test.ipynb#Y203sZmlsZQ%3D%3D?line=3)     print(chunk.content, end="|", flush=True)

File [~/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py:375](https://file+.vscode-resource.vscode-cdn.net/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/~/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py:375), in BaseChatModel.stream(self, input, config, stop, **kwargs)
    368 except BaseException as e:
    369     run_manager.on_llm_error(
    370         e,
    371         response=LLMResult(
    372             generations=[[generation]] if generation else []
    373         ),
    374     )
--> 375     raise e
    376 else:
    377     run_manager.on_llm_end(LLMResult(generations=[[generation]]))

File [~/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py:355](https://file+.vscode-resource.vscode-cdn.net/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/~/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py:355), in BaseChatModel.stream(self, input, config, stop, **kwargs)
    353 generation: Optional[ChatGenerationChunk] = None
    354 try:
--> 355     for chunk in self._stream(messages, stop=stop, **kwargs):
    356         if chunk.message.id is None:
...
   (...)
   1027     stream_cls=stream_cls,
   1028 )

APIStatusError: Error code: 424 - {'detail': 'Not Found'}

相关问题