llama_index [Bug]: text_embedding_inference. JSONDecodeError: Expecting value: line 1 column 1 (char 0)

ui7jx7zq  于 2个月前  发布在  其他
关注(0)|答案(5)|浏览(29)

错误描述

我尝试在这个示例中运行代码片段
https://docs.llamaindex.ai/en/stable/examples/embeddings/text_embedding_inference/
得到了以下错误

JSONDecodeError                           Traceback (most recent call last)
<ipython-input-26-a7c0f9477fcc> in <cell line: 13>()
     11 
     12 
---> 13 embeddings = embed_model.get_text_embedding("Hello World!")
     14 print(len(embeddings))
     15 print(embeddings[:5])

8 frames
/usr/lib/python3.10/json/decoder.py in raw_decode(self, s, idx)
    353             obj, end = self.scan_once(s, idx)
    354         except StopIteration as err:
--> 355             raise JSONDecodeError("Expecting value", s, err.value) from None
    356         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

版本

使用的是提供的colab笔记本

重现步骤

在这里运行colab笔记本
https://docs.llamaindex.ai/en/stable/examples/embeddings/text_embedding_inference/

相关日志/回溯

  • 无响应*
vof42yt1

vof42yt11#

嘿 @nitskoh! 👋 我在您等待人工维护者时为您提供帮助,解决bug、问题等。让我们一起解决这些bug吧!
要解决在运行LlamaIndex示例中的文本嵌入推理代码片段时遇到的 JSONDecodeError 错误,请确保您使用的模型名称正确且受Predibase支持。错误表明模型部署ID可能未在Predibase中找到。
以下是一些排查和解决问题的步骤:

  1. 验证模型名称:确保您使用的 model_name 是正确的,并列在Predibase支持的模型中。您可以参考Predibase文档以获取支持的模型列表: Predibase Models
  2. 检查API令牌:确保 PREDIBASE_API_TOKEN 设置正确且有效。
  3. 更新基本URL:确保 base_url 正确设置为文本嵌入服务的端点。默认URL设置为 http://127.0.0.1:8080 ,但可能需要根据您的部署配置进行更新。
    这里是引发错误的相关代码部分:
import httpx

def _call_api(self, texts: List[str]) -> List[List[float]]:
    headers = {"Content-Type": "application/json"}
    if self.auth_token is not None:
        if callable(self.auth_token):
            headers["Authorization"] = self.auth_token(self.base_url)
        else:
            headers["Authorization"] = self.auth_token
    json_data = {"inputs": texts, "truncate": self.truncate_text}

    with httpx.Client() as client:
        response = client.post(
            f"{self.base_url}/embed",
            headers=headers,
            json=json_data,
            timeout=self.timeout,
        )

    return response.json()

此外,如果您使用的是 LlamaIndex 中的 HuggingFaceEmbedding 类,请确保模型和分词器正确初始化。以下是 HuggingFaceEmbedding 类的代码片段:

class HuggingFaceEmbedding(BaseEmbedding):
    tokenizer_name: str = Field(description="Tokenizer name from HuggingFace.")
    max_length: int = Field(
        default=DEFAULT_HUGGINGFACE_LENGTH, description="Maximum length of input.", gt=0
    )
    pooling: Pooling = Field(default=None, description="Pooling strategy.")
    normalize: bool = Field(default=True, description="Normalize embeddings or not.")
    query_instruction: Optional[str] = Field(
        description="Instruction to prepend to query text."
    )
    text_instruction: Optional[str] = Field(
        description="Instruction to prepend to text."
    )
    cache_folder: Optional[str] = Field(
        description="Cache folder for huggingface files."
    )

    _model: Any = PrivateAttr()
    _tokenizer: Any = PrivateAttr()
    _device: str = PrivateAttr()

    def __init__(
        self,
        model_name: Optional[str] = None,
        tokenizer_name: Optional[str] = None,
        pooling: Optional[str] = None,
        max_length: Optional[int] = None,
        query_instruction: Optional[str] = None,
        text_instruction: Optional[str] = None,
        normalize: bool = True,
        model: Optional[Any] = None,
        tokenizer: Optional[Any] = None,
        embed_batch_size: int = DEFAULT_EMBED_BATCH_SIZE,
        cache_folder: Optional[str] = None,
        trust_remote_code: bool = False,
        device: Optional[str] = None,
        callback_manager: Optional[CallbackManager] = None,
    ):
        try:
            from transformers import AutoModel, AutoTokenizer
        except ImportError:
            raise ImportError(
                "HuggingFaceEmbedding requires transformers to be installed.\n"
                "Please install transformers with `pip install transformers`."
            )

        self._device = device or infer_torch_device()

        cache_folder = cache_folder or get_cache_dir()
        if model is None:  # Use model_name with AutoModel
            model_name = (
                model_name
                if model_name is not None
                else DEFAULT_HUGGINGFACE_EMBEDDING_MODEL
            )
            model = AutoModel.from_pretrained(
                model_name, cache_dir=cache_folder, trust_remote_code=trust_remote_code
            )
        elif model_name is None:  # Extract model_name from model
            model_name = model.name_or_path
        self._model = model.to(self._device)

        if tokenizer is None:  # Use tokenizer_name with AutoTokenizer
            tokenizer_name = (
                model_name or tokenizer_name or DEFAULT_HUGGINGFACE_EMBEDDING_MODEL
            )
            tokenizer = AutoTokenizer.from_pretrained(
                tokenizer_name, cache_dir=cache_folder
            )
        elif tokenizer_name is None:  # Extract tokenizer_name from model
            tokenizer_name = tokenizer.name_or_path
        self._tokenizer = tokenizer

        if max_length is None:
            try:
                max_length = int(self._model.config.max_position_embeddings)
            except AttributeError as exc:
                raise ValueError(
                    "Unable to find max_length from model config. Please specify max_length."
                ) from exc

        if not pooling:
            pooling = get_pooling_mode(model_name)
        try:
            pooling = Pooling(pooling)
        except ValueError as exc:
            raise NotImplementedError(
                f"Pooling {pooling} unsupported, please pick one in"
                f" {[p.value for p in Pooling]}."
            ) from exc
        super().__init__(
            embed_batch_size=embed_batch_size,
            callback_manager=callback_manager,
            model_name=model_name,
            tokenizer_name=tokenizer_name,
            max_length=max_length,
            pooling=pooling,
            normalize=normalize,
            query_instruction=query_instruction,
            text_instruction=text_instruction,
        )

确保模型和分词器正确设置,并且 max_lengthpooling 参数已正确配置。这应该有助于解决 JSONDecodeError1(https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/llms/llama-index-llms-predibase/llama_index/llms/predibase/base.py)2(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/embeddings/text_embeddings_inference.py)3(https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/embeddings/llama-index-embeddings-text-embeddings-inference/llama_index/embeddings/text_embeddings_inference/base.py)4(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/embeddings/huggingface.py) 。

  • 要继续与 Dosu 交谈,请提及 @dosu。*
92dk7w1h

92dk7w1h2#

这没有帮助。因为我正在使用你的示例

dy2hfwbg

dy2hfwbg3#

看起来第一步是部署嵌入服务器,这听起来像是一个付费选项,如描述 here
cc: @namin@tobi

unhi4e5o

unhi4e5o4#

它不是付费的,但在使用之前需要将其部署到某个地方(本地Docker,Web服务等)。

kadbb459

kadbb4595#

感谢你的留言,@logan-markewich。你使用过哪些部署方法?

相关问题