llama_index [Bug]:在使用DynamicLLMPathExtractor时如何检索？

c8ib6hqw 于 5个月前发布在其他

关注(0)|答案(2)|浏览(44)

当前正在进行的数据摄取过程如下：

kg_extractor = DynamicLLMPathExtractor(
    llm=llm,
    max_triplets_per_chunk=20,
    num_workers=4,
    allowed_entity_types=None,  # 让LLM实时推断实体及其标签(类型)
    allowed_relation_types=None,  # 让LLM实时推断关系
)
dynamic_index = PropertyGraphIndex.from_documents(
    documents,
    llm=llm,
    embed_kg_nodes=False,
    kg_extractors=[kg_extractor],
    show_progress=True,
    property_graph_store=graph_store,
)

然而，我无法使用以下LLMSynonymRetriever方法检索。请注意，当我使用simplellmpathextractor进行摄取时，我可以检索到。我的担忧是，在dynamicpathextractor中，有来自neo4j bolt的标签，而其他方法中没有。

import nest_asyncio
nest_asyncio.apply()
from llama_index.core.indices.property_graph import (
    PGRetriever,
    VectorContextRetriever,
    LLMSynonymRetriever,
)
sub_retrievers = [
    LLMSynonymRetriever(graph_store),
]
retriever = PGRetriever(sub_retrievers=sub_retrievers)
nodes = retriever.retrieve("Insurbridge Connects to")
for node in nodes:
    print(node.text)
from llama_index.core import PropertyGraphIndex
index = PropertyGraphIndex.from_existing(
    property_graph_store=graph_store
)
query_engine = index.as_query_engine(sub_retrievers=[retriever], llm=llm)
response = query_engine.query("Insurbridge Connects to")
print(str(response))

llama_index

来源：https://github.com/run-llama/llama_index/issues/14827

2条答案

按热度按时间

mznpcxlj1#

要解决使用 DynamicLLMPathExtractor 与 LLMSynonymRetriever 检索数据的问题，您需要确保两个组件的配置和参数都正确设置。以下是一些步骤和注意事项：

DynamicLLMPathExtractor 的配置:
确保 DynamicLLMPathExtractor 已正确配置以提取所需的三元组和标签。

kg_extractor = DynamicLLMPathExtractor(
    llm=llm,
    max_triplets_per_chunk=20,
    num_workers=4,
    allowed_entity_types=None,  # Let the LLM infer entities and their labels (types) on the fly
    allowed_relation_types=None,  # Let the LLM infer relationships on the fly
)

dynamic_index = PropertyGraphIndex.from_documents(
    documents,
    llm=llm,
    embed_kg_nodes=False,
    kg_extractors=[kg_extractor],
    show_progress=True,
    property_graph_store=graph_store,    
)

LLMSynonymRetriever 的配置:
确保 LLMSynonymRetriever 已设置正确的参数以处理提取的数据。

from llama_index.core.indices.property_graph import LLMSynonymRetriever

synonym_retriever = LLMSynonymRetriever(
    graph_store=graph_store,
    llm=llm,
    include_text=False,
    synonym_prompt="Given some initial query, generate synonyms or related keywords up to {max_keywords} in total, considering possible cases of capitalization, pluralization, common expressions, etc.\nProvide all synonyms/keywords separated by '^' symbols: 'keyword1^keyword2^...'\nNote, result should be in one-line, separated by '^' symbols.\n----\nQUERY: {query_str}\n----\nKEYWORDS: ",
    output_parsing_fn=lambda output: [x.strip().capitalize() for x in output.strip().split("^") if x.strip()],
    max_keywords=10,
    path_depth=1,
)

组合检索器:
在 PropertyGraphIndex 中组合 DynamicLLMPathExtractor 和 LLMSynonymRetriever。

from llama_index.core.indices.property_graph import PGRetriever

sub_retrievers = [
    synonym_retriever,
]

retriever = PGRetriever(sub_retrievers=sub_retrievers)

nodes = retriever.retrieve("Insurbridge Connects to")

for node in nodes:
    print(node.text)

from llama_index.core import PropertyGraphIndex
index = PropertyGraphIndex.from_existing(
    property_graph_store=graph_store
)

query_engine = index.as_query_engine(sub_retrievers=[retriever], llm=llm)

response = query_engine.query("Insurbridge Connects to")
print(str(response))

调试和日志记录:
启用日志记录以追踪数据流并确定检索过程可能失败的位置。

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

检查特定标签:
验证 DynamicLLMPathExtractor 生成的标签是否导致问题。这些标签可能与 SimpleLLMPathExtractor 生成的标签不同，这可能会影响检索。
如果问题仍然存在，您可能需要检查由 DynamicLLMPathExtractor 生成的具体标签和数据结构，并将它们与 SimpleLLMPathExtractor 中的进行比较，以确定差异。