llama_index [Bug]: LLMTextCompletionProgram输出正确答案,但导致ValueError,

vxqlmq5t  于 2个月前  发布在  其他
关注(0)|答案(5)|浏览(30)

Bug描述

当我使用LLMTextCompletionProgram将pydantic模型集成到我的输出中时,我得到了一个非常奇怪的结果。错误表明输出是正确的(134436839ABCDFG),但它仍然抛出一个错误,表示无法从输出中提取json字符串。没有办法获取值,因为我必须调用llm_completion()方法,它会抛出错误。我正在使用的LLM是gpt 3.5 turbo。

prompt_template =  """You are an AI expert in extracting data from documents. Below is a document extract from a Know Your Customer document. Consider this extract as context to the prompt and answer the below question. Question: {question} Context: {context}."""

question= "Extract the 'ABC' for the company from the text. The 'ABC' should be in the format XXXXXXXXXXXXXXXXXXXX, where X can be either a number or an uppercase letter (e.g., 529900QF6F5IB9WQCL03). Please respond with only the LEI and no additional information. Return 'No info' if no ABC is found."

prompt= json_val['prompts']['prompt_template'].format(question=question, context=processed_document)
llm_completion = LLMTextCompletionProgram.from_defaults(
    llm=llm,
    output_cls=ABC,
    prompt_template_str=prompt,
    verbose=True,
)

x = llm_completion()
class ABC(BaseModel):

    abc: str = Field(..., description="The 'ABC' or 'No info' if it doesn't exist.")

    @validator("abc")
    def validate_abc_code(cls, v):
        if v.lower() != "no info" and not re.match(r"^[0-9]{4}[0]{2}[A-Z0-9]{12}[0-9]{2}$", v):
            raise ValueError("Invalid ABC code format")
        return v

版本

0.10.54

重现步骤

我在gpt 3.5 llm上进行了测试。处理的文件是一个随机文档的字符串,我试图从中提取信息。

相关日志/回溯

ValueError: Could not extract json string from output: "abc": "134436839ABCDFG"
File <command-1015110231477814>, line 1
----> 1 llm_completion()
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/llama_index/core/output_parsers/utils.py:112, in extract_json_str(text)
    110 match = re.search(r"\{.*\}", text.strip(), re.MULTILINE | re.IGNORECASE | re.DOTALL)
    111 if not match:
--> 112     raise ValueError(f"Could not extract json string from output: {text}")
    114 return match.group()
n1bvdmb6

n1bvdmb61#

嘿@jakubstrawa1!👋我在这里帮助你解决任何bug、问题或贡献。让我们一起解决这些bug吧!
要解决在使用LLMTextCompletionProgram与Pydantic建模时,从输出中提取JSON字符串的错误指示,请确保指导程序的输出以正确的JSON格式和markdown块格式化。以下是解决此问题的方法:

  1. 确保JSON块格式:确保LLM的响应是一个用markdown格式 Package 的JSON块(即,在json and内)。
  2. 解析函数:使用parse_pydantic_from_guidance_program函数解析输出。该函数尝试从响应中解析最后一个markdown格式的JSON块。
    这是处理此问题的相关代码部分:
from typing import Type, TypeVar
from llama_index.core.bridge.pydantic import BaseModel
from llama_index.core.output_parsers.base import OutputParserException
from llama_index.core.output_parsers.utils import parse_json_markdown

Model = TypeVar("Model", bound=BaseModel)

def parse_pydantic_from_guidance_program(
    response: str, cls: Type[Model], verbose: bool = False
) -> Model:
    try:
        output = response.split("```json")[-1]
        output = "```json" + output
        if verbose:
            print("Raw output:")
            print(output)
        json_dict = parse_json_markdown(output)
        sub_questions = cls.parse_obj(json_dict)
    except Exception as e:
        raise OutputParserException(
            "Failed to parse pydantic object from guidance program"
            ". Probably the LLM failed to produce data with right json schema"
        ) from e
    return sub_questions
  1. 处理JSON格式问题:parse_json_markdown函数包含额外的逻辑来处理JSON格式中的潜在问题,例如尾随逗号。
    这是parse_json_markdown函数的相关部分:
import contextlib
import json
import re
from typing import Any, List

with contextlib.suppress(ImportError):
    import yaml

from llama_index.core.output_parsers.base import OutputParserException

def _marshal_llm_to_json(output: str) -> str:
    output = output.strip().replace("{{", "{").replace("}}", "}")

    left_square = output.find("[")
    left_brace = output.find("{")

    if left_square < left_brace and left_square != -1:
        left = left_square
        right = output.rfind("]")
    else:
        left = left_brace
        right = output.rfind("}")

    return output[left : right + 1]

def parse_json_markdown(text: str) -> Any:
    if "```json" in text:
        text = text.split("```json")[1].strip().strip("```").strip()

    json_string = _marshal_llm_to_json(text)

    try:
        json_obj = json.loads(json_string)
    except json.JSONDecodeError as e_json:
        try:
            json_obj = yaml.safe_load(json_string)
        except yaml.YAMLError as e_yaml:
            raise OutputParserException(
                f"Got invalid JSON object. Error: {e_json} {e_yaml}. "
                f"Got JSON string: {json_string}"
            )
        except NameError as exc:
            raise ImportError("Please pip install PyYAML.") from exc

    return json_obj

确保你的LLM的输出正确地作为markdown内的JSON块格式化,然后parse_pydantic_from_guidance_program函数将处理JSON字符串的提取和解析为Pydantic模型1(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/prompts/guidance_utils.py)2(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/prompts/guidance_utils.py)3(https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/program/llama-index-program-guidance/llama_index/program/guidance/utils.py)4(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/output_parsers/utils.py)5(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/output_parsers/utils.py)。

  • 要继续与Dosu交谈,请提及@dosu。*
mi7gmzs6

mi7gmzs62#

这段文本内容的翻译结果为:@dosu 这不起作用。我不明白如何从LLMTextCompletionProgram获取解析Pydantic所需的响应。

ecbunoof

ecbunoof3#

@jakubstrawa1 结果没有括号,因此无法解析为JSON。

vc6uscn9

vc6uscn94#

@logan-markewich 这真的很有趣,因为我有10个Pydantic类,在使用llm.complete(不使用Pydantic)时,只有提示和llm能得到正确的输出。但是当我使用LLMTextCompletionProgram时,我得到了无法解析的错误。所有输入都是相同的且格式相同(字符串)。

ddarikpa

ddarikpa5#

并且,我们无法看到使用LLMTextCompletionProgram.from_defaults()从llm返回的响应。我只能推测,llm.complete(prompt)将给我大致相同的答案。那么,我如何知道它是带括号还是不带括号发送的?

相关问题