Bug描述
当我构建知识图谱时,我希望Neo4j中的节点包含嵌入向量作为属性。
但是只有id
作为"Entity"类型的节点属性。我遵循了示例:https://docs.llamaindex.ai/en/stable/examples/index_structs/knowledge_graph/KnowledgeGraphDemo/
而不是示例中建议的OpenAI模型,我使用了huggingface的"TheBloke/Mistral-7B-Instruct-v0.2-GGUF"和"thenlper/gte-large"作为嵌入模型。
以下是使用Cypher查询从Neo4j检索的一些样本节点。嵌入向量无处可寻。
MATCH (n) RETURN n LIMIT 2;
{
"identity": 0,
"labels": [
"Entity"
],
"properties": {
"id": "Paul graham"
},
"elementId": "4:7dcf7873-019c-40e4-bcdc-9b50ab418257:0"
}
{
"identity": 1,
"labels": [
"Entity"
],
"properties": {
"id": "Writing"
},
"elementId": "4:7dcf7873-019c-40e4-bcdc-9b50ab418257:1"
}
版本
0.1.4
重现步骤
以下是我使用的示例代码
# https://docs.llamaindex.ai/en/stable/examples/index_structs/knowledge_graph/KnowledgeGraphDemo/
import time
import torch
from llama_index.core import Settings
from huggingface_hub import hf_hub_download, snapshot_download
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.llms.llama_cpp.llama_utils import (
messages_to_prompt,
completion_to_prompt,
)
from llama_index.core import KnowledgeGraphIndex, SimpleDirectoryReader
from llama_index.core import StorageContext
from llama_index.graph_stores.neo4j import Neo4jGraphStore
from IPython.display import Markdown, display
supported_embed_models = ["thenlper/gte-large"]
supported_llm_models = {
"TheBloke/Mistral-7B-Instruct-v0.2-GGUF": "mistral-7b-instruct-v0.2.Q5_K_M.gguf",
"microsoft/Phi-3-mini-4k-instruct-gguf": "Phi-3-mini-4k-instruct-q4.gguf",
}
model_name="TheBloke/Mistral-7B-Instruct-v0.2-GGUF"
embed_model_name="thenlper/gte-large"
temperature=0.0
max_new_tokens=256
context_window=4096
gpu_layers=20
dim=1024
memory_token_limit=4096
sentense_embedding_percentile_cutoff=0.8
similarity_top_k=2
hf_token="<Hugging-face-token>"
MODELS_PATH = "./models"
EMBED_PATH = "./embed_models"
n_gpu_layers = 0
if torch.cuda.is_available():
print("It is a GPU node, setup GPU.")
n_gpu_layers = gpu_layers
def get_model_path(model_name):
filename = supported_llm_models[model_name]
model_path = hf_hub_download(
repo_id=model_name,
filename=filename,
resume_download=True,
cache_dir=MODELS_PATH,
local_files_only=False,
token=hf_token,
)
return model_path
def get_embed_model_path( embed_model):
embed_model_path = snapshot_download(
repo_id=embed_model,
resume_download=True,
cache_dir=EMBED_PATH,
local_files_only=False,
token=hf_token,
)
return embed_model_path
llm = LlamaCPP(
model_path=get_model_path(model_name),
temperature=temperature,
max_new_tokens=max_new_tokens,
# llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
context_window=context_window,
# kwargs to pass to __call__()
# generate_kwargs={"temperature": 0.0, "top_k": 5, "top_p": 0.95},
generate_kwargs={"temperature": temperature},
# kwargs to pass to __init__()
# set to at least 1 to use GPU
model_kwargs={"n_gpu_layers": n_gpu_layers},
# transform inputs into Llama2 format
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
verbose=True,
)
embed_model = HuggingFaceEmbedding(
model_name=embed_model_name,
cache_folder=EMBED_PATH,
)
Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = 1024
# copy the essay into the directory: https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
documents = SimpleDirectoryReader("./example_data").load_data()
graph_store = Neo4jGraphStore(
username=<Neo4j-username>,
password=<Neo4j-password>,
url=<Neo4j-uri>,
database=<Neo4j-database>,
)
graph_store.query("MATCH (n) DETACH DELETE n")
storage_context = StorageContext.from_defaults(graph_store=graph_store)
# NOTE: can take a while!
index = KnowledgeGraphIndex.from_documents(
documents,
storage_context=storage_context,
max_triplets_per_chunk=2,
include_embeddings=True,
show_progress=True,
)
query_engine1 = index.as_query_engine(
include_text=False, response_mode="tree_summarize"
)
query_engine2 = index.as_query_engine(
include_text=True, response_mode="tree_summarize"
)
query_engine3 = index.as_query_engine(
include_text=True,
response_mode="tree_summarize",
embedding_mode="hybrid",
similarity_top_k=5,
)
response = query_engine1.query("Tell me more about Interleaf")
display(Markdown(f"<b>{response}</b>"))
response = query_engine2.query(
"Tell me more about what the author worked on at Interleaf"
)
display(Markdown(f"<b>{response}</b>"))
response = query_engine3.query(
"Tell me more about what the author worked on at Interleaf"
)
display(Markdown(f"<b>{response}</b>"))
相关日志/回溯
- 无响应*
2条答案
按热度按时间dzhpxtsq1#
看起来你在嵌入向量没有存储在Neo4j节点中的问题,尽管设置了
include_embeddings=True
,可能是由于几个潜在领域。让我们探索一些调试和解决此问题的途径:Neo4jGraphStore
类或类似中)的方法正确处理嵌入向量。代码应该明确包含将嵌入作为节点属性存储的逻辑。如果缺少或不正确,嵌入将不会被保存。thenlper/gte-large
)。存储机制可能期望特定的格式(例如,浮点数列表),这与模型产生的格式不同。TheBloke/Mistral-7B-Instruct-v0.2-GGUF
,对于嵌入使用thenlper/gte-large
)。确保这些模型与框架完全兼容,特别是关于如何生成和处理嵌入的内容。通过系统地检查这些领域,你应该能够确定问题的根源并实施一个解决方案。如果问题仍然存在,查看LlamaIndex框架的文档或联系其社区以获得支持可能会提供额外的见解。
详情
来源
关于Dosu
此响应旨在帮助您节省时间并提高效率。它不是精确的解决方案,而是您自己研究的起点。
请帮助我变得更有用!如果这有帮助请点赞👍,如果无关请点踩👎。
vpfxa7rd2#
它们不会存储在neo4j中,而是存储在索引结构中。这是一个不稳定的设计,但这就是它的工作原理。而且它不会在短时间内发生改变。
不过,我们正在开发一个全新的索引来彻底改革这个设计。目前的知识图谱索引设计并不理想。