llama_index [问题]:如何为AWS bedrock上的自定义Claude3客户端添加多模态?

ijxebb2r  于 2个月前  发布在  其他
关注(0)|答案(2)|浏览(28)

问题验证

  • 我已经在文档和discord上寻找答案。

问题

我已经为Claude3制作了一个完整的客户端,并且它运行良好。
现在我想知道如何将其扩展为多模态客户端?
当前代码:

class Claude3_Sonnet(CustomLLM):
    context_window: int = 180000
    num_output: int = 4000
    model_name: str = "Claude3_Sonnet"
    dummy_response: str = "My response"
    system_message: str = "You are a helpful assistant"
    api_key: str
    production: bool = True
    temperature: float = 0.5

    @property
    def metadata(self) -> LLMMetadata:
        """Get LLM metadata."""
        return LLMMetadata(
            context_window=self.context_window,
            num_output=self.num_output,
            model_name=self.model_name,
            is_chat_model = False,
        )

    @llm_completion_callback()
    def complete(self, prompt: str, system_message: str ="None", **kwargs: Any) -> CompletionResponse:
        if self.production == True:
            api_url = "https:/xxxx/gpt/v2/claude-3-haiku-20240307-v1"
        else:
            api_url = "https://xxxx/gpt/v2/claude-3-haiku-20240307-v1"
        if not self.api_key:
            raise ValueError("API key is not set.")
        if system_message== "None":
            system_message = self.system_message
        headers = {"APIKey": self.api_key}
        payload = {
            "max_tokens": self.num_output,
            "messages": [{"role": "user", "content": prompt}],
            "system": system_message,
            "anthropic_version": "bedrock-2023-05-31",
            "temperature": self.temperature
        }

        response = requests.post(api_url, headers=headers, json=payload)
        if response.status_code == 200:
            response_text = response.json().get("content", [{}])[0].get("text", "")
            return CompletionResponse(text=response_text)
        else:
            return CompletionResponse(text=f"Error: {response.status_code} - {response.reason}")

    @llm_completion_callback()
    def stream_complete(self, prompt: str, **kwargs: Any) -> CompletionResponseGen:
        raise NotImplementedError("This method has not been implemented yet.")

    @llm_completion_callback()
    async def acomplete(self, prompt: str, system_message: str = "None", **kwargs: Any) -> CompletionResponse:
        if self.production == True:
            api_url = "	https://xxxx/gpt/v2/claude-3-haiku-20240307-v1"
        else:
            api_url = "https://xxxx/gpt/v2/claude-3-haiku-20240307-v1"
        if not self.api_key:
            raise ValueError("API key is not set.")
        if system_message == "None":
            system_message = self.system_message
        headers = {"APIKey": self.api_key}
        payload = {
            "max_tokens": self.num_output,
            "messages": [{"role": "user", "content": prompt}],
            "system": system_message,
            "anthropic_version": "bedrock-2023-05-31",
            "temperature": self.temperature
        }

        # Define an inner async function for making the HTTP request with retries and timeout
        @backoff.on_exception(backoff.expo,
                            httpx.ReadTimeout,
                            max_tries=3)
        async def make_request_with_retries():
            # Set a custom timeout (e.g., connect=5.0 seconds, read=10.0 seconds)
            timeout = httpx.Timeout(120)
            async with httpx.AsyncClient(timeout=timeout) as client:
                response = await client.post(api_url, headers=headers, json=payload)
                return response

        # Call the inner function
        response = await make_request_with_retries()
        if response.status_code == 200:
            response_text = response.json().get("content", [{}])[0].get("text", "")
            return CompletionResponse(text=response_text)
        else:
            # Use httpx.codes to get a textual reason for the status code
            reason_phrase = httpx.codes.get_reason_phrase(response.status_code)
            return CompletionResponse(text=f"Error: {response.status_code} - {reason_phrase}")

多模态在AWS Claude 3中的示例用法:

{
    "anthropic_version": "bedrock-2023-05-31",    
    "max_tokens": int,
    "system": string,    
    "messages": [
        {
            "role": string,
            "content": [
                { "type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": "content image bytes" } },
                { "type": "text", "text": "content text" }
      ]
        }
    ],
    "temperature": float,
    "top_p": float,
    "top_k": int,
    "stop_sequences": [string]
}
xjreopfe

xjreopfe1#

要扩展你的Claude3客户端以支持多模态功能,使其能够处理文本和图像,你需要调整有效负载结构以容纳多模态数据。以下是实现这一目标的简化方法:

  1. 修改用于多模态请求的有效负载:调整有效负载以包含文本和图像数据。对于图像,将它们编码为base64并指定媒体类型(例如,"image/jpeg")。你的有效负载应该看起来像这样:
payload = {
    "anthropic_version": "bedrock-2023-05-31",    
    "max_tokens": 4000,
    "system": "Your system message here",    
    "messages": [
        {
            "role": "user",
            "content": [
                { "type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": "base64 encoded image data" } },
                { "type": "text", "text": "Your prompt here" }
            ]
        }
    ],
    "temperature": 0.5,
    "top_p": 1.0,
    "top_k": 40,
    "stop_sequences": ["Your stop sequences here"]
}
  1. 处理API请求:在进行API请求时,确保你的 completeacomplete 方法可以处理这种新的有效负载结构。你需要调整请求处理以适应文本和图像的同时包含。
  2. 图像预处理:在发送图像之前,将它们转换为base64编码。这一步骤对于在API请求中包含图像至关重要。
    通过遵循这些步骤,你将能够增强你的Claude3客户端以支持多模态交互,显著扩大你的客户端可以理解和响应的输入范围。这种调整允许更丰富的交互体验,利用文本和视觉数据。

参考资料

关于Dosu

此响应旨在帮助你节省时间并提高效率。它不是精确的解决方案,而是你自己的研究的一个起点。
请帮我变得更有用!如果这有帮助,请留下一个👍;如果无关紧要,请留下一个👎。
如果你想继续对话,请以@dosu-bot开始你的回复。

0pizxfdo

0pizxfdo2#

这是一个关键特性,对于使用基岩API的人来说,可以使用异象模型。

相关问题