llama_index [问题]:从KnowledgeGraphRAGRetriever查询中获取相应的检索到的文档信息(元数据、文本等)

dvtswwa3  于 5个月前  发布在  其他
关注(0)|答案(8)|浏览(142)

问题验证

  • 我已经在文档和discord上寻找答案。

问题

你好,我想知道在使用KnowledgeGraphRAGRetriever查询时是否有方法可以获取相应的检索到的文档(带有元数据、文本)吗?因为我检查查询的输出响应,它只包含相关三元组的列表。

izkcnapc

izkcnapc1#

要从 KnowledgeGraphRAGRetriever 获取完整的文档详细信息,包括元数据和文本,您需要确保检索器正确配置,并且检索过程得到妥善处理。以下是详细的步骤:

  1. 确保 include_text=True:此参数必须设置为 True 以在检索过程中包含文档文本。
  2. 使用 _retrieve 方法:KnowledgeGraphRAGRetriever 类中的 _retrieve 方法负责检索带有详细信息的节点。确保此方法正确实现并调用。
  3. 检查测试用例:参考演示如何使用 include_text=True 的测试用例,以确保您的设置正确。
    以下是一个详细的示例:

分步示例

  1. 初始化检索器:
from llama_index.core.retrievers import KnowledgeGraphRAGRetriever
from llama_index.core.storage import StorageContext

# Initialize storage context with a graph store
storage_context = StorageContext.from_defaults(graph_store=your_graph_store)

# Initialize the retriever with include_text=True
graph_rag_retriever = KnowledgeGraphRAGRetriever(
    storage_context=storage_context,
    include_text=True,
    verbose=True,
)
  1. 使用查询引擎与检索器一起使用:
from llama_index.core.query_engine import RetrieverQueryEngine

# Initialize the query engine with the retriever
query_engine = RetrieverQueryEngine.from_args(
    graph_rag_retriever,
)

# Perform a query
response = query_engine.query(
    "Tell me about Peter Quill?",
)

# Display the response
from IPython.display import display, Markdown
display(Markdown(f"<b>{response}</b>"))
  1. 在测试用例中验证检索:
    检查测试用例,确保您的设置与预期行为相符。例如,test_as_retrievertest_retrievers 测试用例演示了如何使用 include_text=True 检索完整的文档详细信息:
@patch.object(
KnowledgeGraphIndex, "_extract_triplets", side_effect=mock_extract_triplets
)
def test_as_retriever(
    _patch_extract_triplets: Any,
    documents: List[Document],
    mock_service_context: ServiceContext,
) -> None:
    graph_store = SimpleGraphStore()
    storage_context = StorageContext.from_defaults(graph_store=graph_store)
    index = KnowledgeGraphIndex.from_documents(
        documents, service_context=mock_service_context, storage_context=storage_context
    )
    retriever = index.as_retriever()
    nodes = retriever.retrieve(QueryBundle("foo"))
    assert len(nodes) == 2
    assert nodes[1].node.get_content() == "Expected content with metadata and text"

通过遵循这些步骤并确保您的设置与示例相匹配,您应该能够使用 KnowledgeGraphRAGRetriever1(https://github.com/run-llama/llama_index/blob/main/llama-index-core/tests/indices/knowledge_graph/test_retrievers.py)2(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/tests/indices/knowledge_graph/test_retrievers.py) 检索完整的文档详细信息,包括元数据和文本。

  • 要继续与 Dosu 交谈,请提及 @dosu。*
7tofc5zh

7tofc5zh2#

@dosu 我的nodes来自nodes = retriever.retrieve(QueryBundle("foo"))只有一个项目。以下是nodes[0].node.get_content()的输出:

The following are knowledge sequence in max depth 2 in the form of directed graph like:
`subject -[predicate]->, object, <-[predicate_next_hop]-, object_next_hop ...` extracted based on key entities as subject:
Login{name: Login} <-[relationship:{relationship: Requires}]- Application{name: Application} -[relationship:{relationship: Needs increase in}]-> Font size{name: Font size}
Login{name: Login} <-[relationship:{relationship: Requires}]- Application{name: Application} -[relationship:{relationship: Displays}]

如您所见,检索到的输出仅包含三元组,其中没有文档元数据和文本。这里有问题吗?

g52tjvyc

g52tjvyc3#

要确保在使用 KnowledgeGraphRAGRetriever 时,检索到的完整文档详细信息(包括元数据和文本)包含在检索结果中,您需要使用 retrieval_query 参数自定义检索到的信息。检索查询必须返回以下四列:

  • text :返回文档的文本
  • score :相似度分数
  • id :节点 ID
  • metadata :包含额外元数据的字典(必须包含 _node_type_node_content 键)

以下是如何设置 retrieval_query 的示例:

retrieval_query = (
    "RETURN 'Interleaf hired Tomaz' AS text, score, node.id AS id, "
    "{author: 'Tomaz', _node_type:node._node_type, _node_content:node._node_content} AS metadata"
)
neo4j_vector_retrieval = Neo4jVectorStore(
    username, password, url, embed_dim, retrieval_query=retrieval_query
)

loaded_index = VectorStoreIndex.from_vector_store(
    neo4j_vector_retrieval
).as_query_engine()
response = loaded_index.query("What happened at interleaf?")
display(Markdown(f"<b>{response}</b>"))

此外,在初始化 KnowledgeGraphRAGRetriever 时,请确保将 include_text 参数设置为 True。此参数确保在查询期间包含每个相关三元组的文档文本来源。
以下是如何初始化 KnowledgeGraphRAGRetriever 的示例:

from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import KnowledgeGraphRAGRetriever

graph_rag_retriever = KnowledgeGraphRAGRetriever(
    storage_context=my_storage_context,
    llm=my_llm,
    entity_extract_fn=my_entity_extract_fn,
    entity_extract_template=my_entity_extract_template,
    entity_extract_policy="union",
    synonym_expand_fn=my_synonym_expand_fn,
    synonym_expand_template=my_synonym_expand_template,
    synonym_expand_policy="union",
    max_entities=5,
    max_synonyms=5,
    retriever_mode="keyword",  # or "embedding" or "hybrid"
    with_nl2graphquery=False,
    graph_traversal_depth=2,
    max_knowledge_sequence=30,
    verbose=True,
    include_text=True
)

query_engine = RetrieverQueryEngine.from_args(
    graph_rag_retriever,
)

此设置确保在检索过程中包含完整的文档详细信息(包括元数据和文本)。

  • 要继续与 1(https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/Neo4jVectorDemo.ipynb)2(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/indices/knowledge_graph/retrievers.py)3(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/knowledge_graph/retrievers.py)4(https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/query_engine/knowledge_graph_rag_query_engine.ipynb) 对话,请提及 @dosu。*
qco9c6ql

qco9c6ql4#

@dosu I do not load Index from Neo4jVectorStore. I use vector_index = VectorStoreIndex.from_documents(document) instead. Also, I use Nebulagraph instead of Neo4j. Please give a suitable example.

gblwokeq

gblwokeq5#

要在使用NebulaGraph查询KnowledgeGraphRAGRetriever时包含完整的文档详细信息,包括元数据和文本,您需要确保您的设置正确配置。以下是适合您用例的示例:

  1. 初始化NebulaGraphStore:
from llama_index.graph_stores.nebula import NebulaGraphStore
from llama_index.core import StorageContext

# Define your storage context with NebulaGraphStore
graph_store = NebulaGraphStore(
    space_name="llamaindex",
    edge_types=["relationship"],
    rel_prop_names=["relationship"],
    tags=["entity"],
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)
  1. 初始化KnowledgeGraphRAGRetriever:
from llama_index.core.retrievers import KnowledgeGraphRAGRetriever

graph_rag_retriever = KnowledgeGraphRAGRetriever(
    storage_context=storage_context,
    include_text=True,  # Ensure text is included in the response
    verbose=True,
)
  1. 使用检索器与查询引擎一起使用:
from llama_index.core.query_engine import RetrieverQueryEngine
from IPython.display import display, Markdown

# Create the query engine
query_engine = RetrieverQueryEngine.from_args(
    graph_rag_retriever,
)

# Perform the query
response = query_engine.query(
    "Tell me about Peter Quill?",
)
display(Markdown(f"<b>{response}</b>"))

此设置确保检索查询返回文本、分数、节点ID和元数据,其中包括_node_type_node_content键。KnowledgeGraphRAGRetriever中的include_text=True参数确保响应中包含文档文本。

此外,您可以使用各种参数自定义NebulaGraphStore以影响获取完整文档详细信息的检索,例如edge_typesrel_prop_namestagstag_prop_namesinclude_vidsession_pool_kwargssanitize_query_outputenhanced_schema5(https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/graph_stores/llama-index-graph-stores-nebula/llama_index/graph_stores/nebula/nebula_graph_store.py)6(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/graph_stores/nebulagraph.py)7(https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/graph_stores/llama-index-graph-stores-nebula/llama_index/graph_stores/nebula/nebula_property_graph.py)。

  • 要继续与Dosu交谈,请提及@dosu。*
z9ju0rcb

z9ju0rcb6#

要确保在使用 KnowledgeGraphRAGRetriever 检索文档时,查询响应中包含检索到的文档的具体元数据和文本内容,您需要在检索器的初始化过程中将 include_text 参数设置为 True 。此外,您还可以使用 retrieval_query 参数自定义检索到的信息。以下是一个详细的例子:

  1. 使用文档初始化 KnowledgeGraphIndex:
from llama_index.core.indices.knowledge_graph import KnowledgeGraphIndex
from llama_index.core import StorageContext

# Initialize storage context with your graph store
storage_context = StorageContext.from_defaults(graph_store=your_graph_store)

# Initialize the KnowledgeGraphIndex with documents
kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    max_triplets_per_chunk=2,
    space_name="your_space_name",
    edge_types=["relationship"],
    rel_prop_names=["relationship"],
    tags=["entity"],
)
  1. 自定义检索查询:
retrieval_query = (
    "RETURN 'Interleaf hired Tomaz' AS text, score, node.id AS id, "
    "{author: 'Tomaz', _node_type:node._node_type, _node_content:node._node_content} AS metadata"
)
  1. 初始化 KnowledgeGraphRAGRetriever:
from llama_index.core.retrievers import KnowledgeGraphRAGRetriever

graph_rag_retriever = KnowledgeGraphRAGRetriever(
    storage_context=storage_context,
    include_text=True,  # Ensure text content is included
    verbose=True,
)
  1. 使用查询引擎与检索器一起使用:
from llama_index.core.query_engine import RetrieverQueryEngine
from IPython.display import display, Markdown

# Create the query engine
query_engine = RetrieverQueryEngine.from_args(
    graph_rag_retriever,
)

# Perform the query
response = query_engine.query(
    "Tell me about Peter Quill?",
)
display(Markdown(f"<b>{response}</b>"))

通过将 include_text 设置为 True 并定义一个包含必要列( textscoreidmetadata )的 retrieval_query ,检索器将在查询响应 1(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/indices/knowledge_graph/retrievers.py)2(https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/Neo4jVectorDemo.ipynb)3(https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/query_engine/knowledge_graph_rag_query_engine.ipynb)4(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/knowledge_graph/retrievers.py) 中包含相关文档的文本内容和特定元数据。

  • 要继续与 Dosu 对话,请提及 @dosu。*
b1zrtrql

b1zrtrql7#

要获取完整的文档详细信息,包括元数据和文本,以及在使用 KnowledgeGraphRAGRetriever 进行查询时的相关三元组,您需要确保将 include_text 参数设置为 True 。这将在查询期间包括每个相关三元组的文档文本来源。
以下是 KnowledgeGraphRAGRetriever 类中显示如何包含文档文本的相关代码片段:

from llama_index.core.retrievers import KnowledgeGraphRAGRetriever

graph_rag_retriever = KnowledgeGraphRAGRetriever(
    storage_context=storage_context,
    verbose=True,
    include_text=True,  # Ensure this is set to True
)

此外,您还可以使用 RetrieverQueryEngine 执行查询并获取完整的文档详细信息:

from llama_index.core.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine.from_args(
    graph_rag_retriever,
)

response = query_engine.query(
    "Tell me about Peter Quill?",
)

通过将 include_text 设置为 True ,检索器将在响应 1(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/indices/knowledge_graph/retrievers.py)2(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/knowledge_graph/retrievers.py)3(https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/query_engine/knowledge_graph_rag_query_engine.ipynb) 中包含完整的文档详细信息,包括元数据和文本,以及相关三元组。

  • 要继续与 Dosu 对话,请提及 @dosu。*
idv4meu8

idv4meu88#

我已经在第一处使用了include_text=True,但是检索器没有包含完整的文档详细信息。请提供更多关于如何从检索器获取它的详细信息。

相关问题