llama_index [Bug]: QueryPipeline only using the query_str of the query transform output QueryBundle

dl5txlt9 于 2个月前发布在其他

关注(0)|答案(3)|浏览(26)

Bug描述

当将查询转换作为 QueryPipeline 的一部分使用时，它实际上什么都没做，因为这样的组件的输出仅仅是原始的 query_str。
具体来说，我正在使用 HyDEQueryTransform 作为 QueryPipeline 的一部分。根据
llama_index/llama-index-core/llama_index/core/indices/query/query_transform/base.py
e4ff32c 中的第 152 行到第 163 行：
| | def_run(self, query_bundle: QueryBundle, metadata: Dict) ->QueryBundle: |
| | """Run query transform.""" |
| | # TODO: support generating multiple hypothetical docs |
| | query_str=query_bundle.query_str |
| | hypothetical_doc=self._llm.predict(self._hyde_prompt, context_str=query_str) |
| | embedding_strs= [hypothetical_doc] |
| | ifself._include_original: |
| | embedding_strs.extend(query_bundle.embedding_strs) |
| | returnQueryBundle( |
| | query_str=query_str, |
| | custom_embedding_strs=embedding_strs, |
| | ) |
一个带有自定义嵌入字符串的 QueryBundle 将是它的输出。此外，根据 https://docs.llamaindex.ai/en/latest/module_guides/querying/pipeline/module_usage/#query-transforms,查询管道中查询转换的输出确实是 query_str,但这种设计实际上会丢弃附加在输出 QueryBundle 上的自定义嵌入字符串。

版本

0.10.38

重现步骤

具体来说，使用 HyDEQueryTransform,只需运行以下脚本并配置 LLM:

from llama_index.core.query_pipeline import QueryPipeline, InputComponent
from llama_index.core.indices.query.query_transform import HyDEQueryTransform

pipeline = QueryPipeline(verbose=True)
pipeline.add_modules(
    {
        "input": InputComponent(),
        # you need to pass in an LLM or configure one in the global settings
        "hyde": HyDEQueryTransform(include_original=False),
    }
)
pipeline.add_link("input", "hyde")

output = pipeline.run(input="the input query string")
print(output)

llama_index

来源：https://github.com/run-llama/llama_index/issues/14044

3条答案

按热度按时间

velaa5lx1#

QueryPipeline 只使用查询转换输出 QueryBundle 的 query_str ,因为 QueryTransformComponent 被设计为仅从转换后的查询捆绑包返回 query_str 。这在 QueryTransformComponent 类的 _run_component 方法中很明显，该方法明确返回一个只包含 query_str 的字典：

def _run_component(self, **kwargs: Any) -> Any:
    """Run component."""
    output = self._query_transform.run(
        kwargs["query_str"],
        metadata=kwargs["metadata"],
    )
    return {"query_str": output.query_str}

此方法不包括 custom_embedding_strs 或任何其他字段的 QueryBundle ,实际上忽略了它们。这种设计选择确保只有经过转换的查询字符串沿着管道传递，简化了查询转换处理 1(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/query/query_transform/base.py)2(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/indices/query/query_transform/base.py) 。
此外，在使用 HyDEQueryTransform 时，转换后的查询字符串用于嵌入查找和摘要。HyDEQueryTransform 生成一个假设的文档以提高嵌入质量，但管道仍然主要依赖于 query_str 进行最终输出。这种方法有助于在查询处理管道 3(https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/query_transformations/HyDEQueryTransformDemo.ipynb) 中保持一致性和简单性。

要继续与 Dosu 交谈，请提及 @dosu。*

赞(0）回复(0）举报 2个月前

o3imoua42#

@dosu do you suggest a solution to use HyDE as a query pipeline step?

赞(0）回复(0）举报 2个月前

qij5mzcb3#

目前，作为解决方法，我用 TransformRetriever 包裹我的检索器。然后，你可以直接在管道中使用变换检索器。

retriever = TransformRetriever(
    retriever=retriever,
    query_transform=HyDEQueryTransform(),
)

如果你想在将其传递给查询引擎之前使用HyDE而不是检索器，那么你也可以尝试 TransformQueryEngine 。

赞(0）回复(0）举报 2个月前

我来回答

llama_index [Bug]: QueryPipeline only using the query_str of the query transform output QueryBundle

Bug描述

版本

重现步骤

3条答案

相关问题

热门标签

最新问答