问题验证
- 我已经在文档和discord上寻找答案。
问题
你好,我在从持久化存储加载索引时遇到了一些问题。
以下脚本保存了我的向量和图索引:
imports
...
logging.basicConfig(level=logging.DEBUG)
logging.basicConfig(level=logging.INFO)
...
os.environ["NEBULA_USER"] = "root"
os.environ["NEBULA_PASSWORD"] = "nebula"
os.environ["NEBULA_ADDRESS"] = "127.0.0.1:9669"
space_name = "test10"
edge_types, rel_prop_names = ["relationship"], ["relationship"]
tags = ["entity"]
def encode_string(s):
return base64.urlsafe_b64encode(s.encode()).decode()
def decode_string(s):
return base64.urlsafe_b64decode(s.encode()).decode()
def sanitize_and_encode(data):
sanitized_data = {}
for key, value in data.items():
if isinstance(value, str):
sanitized_data[key] = encode_string((value))
else:
sanitized_data[key] = value
return sanitized_data
def decode_metadata(metadata):
decoded_metadata = {}
for key, value in metadata.items():
if isinstance(value, str):
decoded_metadata[key] = decode_string(value)
else:
decoded_metadata[key] = value
return decoded_metadata
def load_json_nodes(json_directory):
nodes = []
for filename in os.listdir(json_directory):
if filename.endswith('.json'):
with open(os.path.join(json_directory, filename), 'r') as file:
data = json.load(file)
for node_data in data:
sanitized_metadata = sanitize_and_encode(node_data['metadata'])
node = TextNode(
text=encode_string((node_data['text'])),
id_=node_data['id_'],
embedding=node_data['embedding'],
metadata=sanitized_metadata
)
nodes.append(node)
logging.debug(f"Loaded node ID: {node.id_}, text: {node_data['text']}, metadata: {node_data['metadata']}")
return nodes
def create_index():
graph_store = NebulaGraphStore(
space_name=space_name,
edge_types=[etype.lower() for etype in edge_types],
rel_prop_names=[rprop.lower() for rprop in rel_prop_names],
tags=[tag.lower() for tag in tags]
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)
json_nodes = load_json_nodes("JSON_nodes_999_large_syll_small")
documents = [
Document(
text=decode_string(node.text),
id_=node.id_,
metadata=decode_metadata(node.metadata),
embedding=node.embedding
) for node in json_nodes
]
kg_index = KnowledgeGraphIndex.from_documents(
documents,
storage_context=storage_context,
max_triplets_per_chunk=10,
space_name=space_name,
edge_types=edge_types,
rel_prop_names=rel_prop_names,
tags=tags,
max_knowledge_sequence=15,
include_embeddings=True
)
# Set the index_id for KnowledgeGraphIndex
kg_index.set_index_id("kg_index")
kg_index.storage_context.persist(persist_dir='./storage_graph_syllabus_test_small')
logging.debug(f"KG Index created with {len(documents)} documents")
# Create VectorStoreIndex
vector_index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
# Set the index_id for VectorStoreIndex
vector_index.set_index_id("vector_index")
# Persist the storage context
storage_context.persist(persist_dir='./storage_graph_syllabus_test_small')
logging.debug(f"Vector Index created with {len(documents)} documents")
return kg_index, vector_index, storage_context
print("Creating Index...")
kg_index, vector_index, storage_context = create_index()
print("Index Created...")
然后,在我的查询脚本中的以下函数尝试加载这些索引,但是,出于某种原因,kg索引总是返回空响应:
persist_dir = './storage_graph_syllabus_test_small'
def initialize_indices():
global vector_index, kg_index, vector_retriever, kg_retriever
storage_context = StorageContext.from_defaults(persist_dir=persist_dir)
start_time = time.time()
if os.path.exists(persist_dir):
vector_index = load_index_from_storage(storage_context, index_id="vector_index")
kg_index = load_index_from_storage(storage_context, index_id="kg_index")
else:
...
vector_retriever = VectorIndexRetriever(index=vector_index, top_k=7)
kg_retriever = KnowledgeGraphRAGRetriever(storage_context=storage_context, verbose=True, top_k = 7)
此外,当我运行查询脚本时,我已经包含了调试以给我更多的信息,我也在这里附加了调试信息:
WARNING:llama_index.core.graph_stores.simple:No existing llama_index.core.graph_stores.simple found at ./storage_graph_syllabus_test_small\graph_store.json. Initializing a new graph_store from scratch.
INFO:llama_index.core.indices.loading:Loading indices with ids: ['vector_index']
INFO:llama_index.core.indices.loading:Loading indices with ids: ['kg_index']
WARNING:llama_index.core.indices.knowledge_graph.base:Upgrading previously saved KG index to new storage format.
* Serving Flask app 'main_complete-emb-flask'
* Debug mode: on
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:3000
* Running on http://192.168.2.204:3000
INFO:werkzeug:Press CTRL+C to quit
有人知道为什么会发生这种情况吗?
我还检查了存储索引中的文件,它们似乎正确填充。
感谢您的时间!
4条答案
按热度按时间svmlkihl1#
关于您的知识图谱(KG)索引在从持久性存储加载时返回空响应的问题,尽管存储索引文件似乎已正确填充,但可能是由于
"text"
字段在index_struct
中为null
导致的。在提供的JSON结构中,
"text"
字段设置为null
:text
字段对于知识图谱返回有意义的响应至关重要。如果此字段为null
,则索引可能没有生成响应所需的数据,导致结果为空。此外,确保提供给
KnowledgeGraphQueryEngine
的storage_context
包含有效的graph_store
。graph_store
对于从知识图谱查询和检索数据至关重要。如果graph_store
没有正确设置或丢失,查询引擎将无法检索任何数据,导致返回空响应。要解决此问题,请确保:
text
字段并填充相关数据。KnowledgeGraphQueryEngine
时,storage_context
包含有效的graph_store
。以下是
KnowledgeGraphQueryEngine
应该如何初始化的示例:请确保
your_graph_store
已正确配置并包含必要的模式和数据。此外,如果您正在使用 LlamaIndex 框架中的
KnowledgeGraphIndex
类,请确保graph_store
已正确填充。该类包括一种机制,可以在必要时将先前保存的 KG 索引升级到新存储格式:这确保了
graph_store
已正确填充来自index_struct
的关系Map。如果未执行此升级步骤,graph_store
可能仍为空,导致返回空响应。此外,您可以参考 LlamaIndex 测试中的
test_load_graph_from_storage_simple
函数,以了解如何正确从存储加载图形。以下是测试的简化版本:此测试确保图形正确从存储加载,并且查询响应保持一致 1(https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/index_structs/knowledge_graph/index_kg.json)2(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/query_engine/knowledge_graph_query_engine.py)3(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/knowledge_graph/base.py)4(https://github.com/run-llama/llama_index/blob/main/llama-index-core/tests/indices/test_loading_graph.py)。
envsm3lx2#
我仍然困惑,如何更新我的脚本以确保在创建和存储我的持久存储时,我的rel_map被填充?
@dosu
yx2lnoni3#
为了确保在创建和存储持久存储时,
rel_map
被填充,你需要在插入或更新数据后调用get_rel_map
方法。以下是如何更新脚本以包含此步骤的示例:在这个例子中,在插入三元组后,调用
get_rel_map
方法来使用新的关系更新rel_map
。这确保了rel_map
始终用最新的数据1(https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/graph_stores/llama-index-graph-stores-tidb/llama_index/graph_stores/tidb/graph.py)2(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/graph_stores/kuzu.py)3(https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/graph_stores/llama-index-graph-stores-neo4j/llama_index/graph_stores/neo4j/base.py)4(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/graph_stores/neo4j.py)5(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/graph_stores/simple.py)填充。7eumitmz4#
嘿,@logan-markewich,我已经尝试从dosubot理解以下内容,但我很难正确实现。我只希望能够从持久化目录加载我创建的向量和图形索引,而不必连接到我的nebula图服务器。这是可能的吗?
我想要填充rel_map,因为在index_store.json(图形存储)中它是空的。原因是我认为当我使用这些索引调用持久化目录时,我的关系应该在这里。
你有什么建议吗?我的最终目标是在不要求持续连接到nebula图的情况下部署这个应用程序。