llama_index [Bug]:SubQuestionQueryEngine生成的不是markdown格式的文本,而是python脚本,这会导致OutputParserException,

am46iovg  于 5个月前  发布在  Python
关注(0)|答案(9)|浏览(72)

问题描述:在使用SubQuestionQueryEngine与多个工具时,程序崩溃。具体错误信息如下:

Traceback (most recent call last):
File "C:\Users\sergii.vashchyshchuk\AppData\Roaming\Python\Python312\site-packages\llama_index\core\output_parsers\utils.py", line 45, in parse_json_markdown
 json_obj = json.loads(json_string)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\json_*init*_.py", line 346, in loads
 return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\json\decoder.py", line 340, in decode
 raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 3 column 5 (char 8)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\sergii.vashchyshchuk\AppData\Roaming\Python\Python312\site-packages\llama_index\core\output_parsers\utils.py", line 52, in parse_json_markdown
 json_obj = yaml.safe_load(json_string)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\yaml_*init*_.py", line 125, in safe_load
 return load(stream, SafeLoader)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\yaml*\init*_.py", line 81, in load
 return loader.get_single_data()
  File "C:\Program Files\Python312\Lib\yaml*\init*_.py", line 59, in get_single_node
  if not self.check_event(StreamEndEvent):
  File "C:\Program Files\Python312\Lib\yaml*\init*_.py", line 70, in check_event
  raise ParserError(None, None,
  File "C:\Program Files\Python312\Lib\yaml*\init*_.py", line 98, in check_event
  self.current_event = self.state()
File "C:\Program Files\Python312\Lib\yaml*\init*_.py", line 171, in parse_document_start
 raise ParserError(None, None,
  File "C:\AI\llama\rag2.py", line 119, in response = top_query_engine.query(query)
  File "C:\Users\sergii.vashchyshchuk\AppData\Roaming\Python\Python312\site-packages\llama_index\core\base\base_query_engine.py", line 53, in query
 query_result = self._query(str_or_query_bundle)
 File "C:\Users\sergii.vashchyshchuk\AppData\Roaming\Python\Python312\site-packages\llama_index\core\query_engine\sub_question_query_engine.py", line 145, in _query
 sub_questions = self._question_gen.generate(self._metadatas, query_bundle)
 File "C:\Users\sergii.vashchyshchuk\AppData\Roaming\Python\Python312\site-packages\llama_index\core\question_gen\llm_generators.py", line 81, in generate
 parse = self._prompt.output_parser.parse(prediction)
 File "C:\Users\sergii.vashchyshchuk\AppData\Roaming\Python\Python312\site-packages\llama_index\core\output_parsers
btxsgosb

btxsgosb1#

你好,@SNIKO!我是Dosu,一个在这里帮助解决bug、回答问题并引导你进行贡献的机器人。我正在处理你的问题,很快就会给你一个完整的答案。让我们一起解决这个问题吧!

tpxzln5u

tpxzln5u2#

我认为提示相当清晰?但您也可以自定义提示
开源的llms对于结构化输出来说不太可靠。在这种情况下,我可能会使用ollama,并打开json模式

5sxhfpxr

5sxhfpxr3#

对于开源模型来说,提示(question_gen_prompt)不够清晰。有没有办法覆盖这个提示?我想尝试让它更清晰一些。

6mw9ycah

6mw9ycah4#

你必须重写问题生成器

提示有点复杂。我只会修改前缀 tbh。但这是整个过程

from llama_index.core.question_gen import LLMQuestionGenerator

PREFIX = """\
Given a user question, and a list of tools, output a list of relevant sub-questions \
in json markdown that when composed can help answer the full user question:

"""

example_query_str = (
    "Compare and contrast the revenue growth and EBITDA of Uber and Lyft for year 2021"
)
example_tools = [
    ToolMetadata(
        name="uber_10k",
        description="Provides information about Uber financials for year 2021",
    ),
    ToolMetadata(
        name="lyft_10k",
        description="Provides information about Lyft financials for year 2021",
    ),
]
example_tools_str = build_tools_text(example_tools)
example_output = [
    SubQuestion(
        sub_question="What is the revenue growth of Uber", tool_name="uber_10k"
    ),
    SubQuestion(sub_question="What is the EBITDA of Uber", tool_name="uber_10k"),
    SubQuestion(
        sub_question="What is the revenue growth of Lyft", tool_name="lyft_10k"
    ),
    SubQuestion(sub_question="What is the EBITDA of Lyft", tool_name="lyft_10k"),
]
example_output_str = json.dumps({"items": [x.dict() for x in example_output]}, indent=4)

EXAMPLES = f"""\
# Example 1
<Tools>
\`\`\`json
{example_tools_str}
\`\`\`

<User Question>
{example_query_str}

<Output>
\`\`\`json
{example_output_str}
\`\`\`

""".replace(
    "{", "{{"
).replace(
    "}", "}}"
)

SUFFIX = """\
# Example 2
<Tools>
\`\`\`json
{tools_str}
\`\`\`

<User Question>
{query_str}

<Output>
"""

DEFAULT_SUB_QUESTION_PROMPT_TMPL = PREFIX + EXAMPLES + SUFFIX

question_gen = LLMQuestionGenerator.from_defaults(
  llm=llm,
  prompt_template_str=DEFAULT_SUB_QUESTION_PROMPT_TMPL
)

sub_question_engine = SubQuestionQueryEngine.from_defaults(..., question_gen=question_gen)
tp5buhyn

tp5buhyn5#

(忽略我插入的 \ 字符,我只是想让markdown正确渲染)

62lalag4

62lalag46#

是的,将此添加到提示的末尾可以解决这个问题:

  • 确保生成一个实际的json作为输出,而不是生成json的脚本。*
n8ghc7c1

n8ghc7c17#

这确实很具体,只是为了解决一个特定的边缘情况😅。在顶部,它说json markdown。也许有些llms更关注结尾而不是开头

lkaoscv7

lkaoscv78#

没有那么多的模型,你可以写一个满足所有需求的提示,尤其是像llama3这样的流行模型。
为“愚蠢”的模型添加更严格的指令不会有什么坏处。我意识到小模型经常可能生成无效的json(这里已经有一些报告的bug),它们是无望的,你无法做什么。但是对于像这样误解提示的模型——我们可以向提示中添加更多的指令,这不会影响其他模型。

r7s23pms

r7s23pms9#

为了解决SubQuestionQueryEngine生成Python脚本而不是预期的JSON markdown的问题,您可以自定义提示以确保它明确请求所需的输出格式。这种方法涉及调整提示以明确和具体地说明输出要求,这可以帮助引导AI模型产生正确的格式,特别是对于像llama3这样的模型,它们可能会误解不太具体的指令。

根据您的需求,这里有一个简化的模板:

{{ "items": [
  {{"sub_question": "Uber在2021年的营收增长是多少?", "tool_name": "uber_10k"}},
  {{"sub_question": "Uber在2021年的EBITDA是多少?", "tool_name": "uber_10k"}},
  {{"sub_question": "Lyft在2021年的营收增长是多少?", "tool_name": "lyft_10k"}},
  {{"sub_question": "Lyft在2021年的EBITDA是多少?", "tool_name": "lyft_10k"}}
]}}

"""
SUFFIX = """现在,给定以下工具和用户问题,生成相应的子问题(JSON markdown格式):"""

将它们组合成完整的提示

complete_prompt = PREFIX + EXAMPLES + SUFFIX

This template clearly states the task, provides detailed examples, and reinforces the expected output format, which should help in generating valid JSON markdown for questions. It's designed to be adaptable for various AI models and tasks, ensuring clarity and consistency in the expected output format.

Regarding the `OutputParserException`, ensuring that the examples and the expected output are in valid JSON format is crucial for avoiding these errors. This approach does not rely on overly specific instructions that might not apply to other contexts but focuses on the format and the type of reasoning required to generate sub-questions.

If you encounter further issues or have any questions, feel free to ask for more clarification.

<!-- Dosu Comment Footer -->
<details>
<summary>

#### Details
</summary>

**Sources**
- [llama-index-core/llama_index/core/question_gen/prompts.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/question_gen/prompts.py)
- [llama-index-legacy/llama_index/legacy/question_gen/prompts.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/question_gen/prompts.py)
- [llama-index-core/llama_index/core/question_gen/llm_generators.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/question_gen/llm_generators.py)
- [llama-index-legacy/llama_index/legacy/program/llm_program.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/program/llm_program.py)
- [llama-index-core/llama_index/core/question_gen/output_parser.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/question_gen/output_parser.py)
- [llama-index-legacy/llama_index/legacy/question_gen/output_parser.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/question_gen/output_parser.py)

<br/>

**About Dosu**
This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
</details>

*To continue the conversation, mention @dosu.*

相关问题