llama_index [Bug]:无法使用嵌入构建neo4j知识图谱

wrrgggsh  于 2个月前  发布在  其他
关注(0)|答案(5)|浏览(38)

Bug描述

在使用外部图存储(如本例中的neo4j)时,知识图谱构建与使用嵌入之间似乎存在连接问题。
我尝试构建一个neo4j知识图谱,当我调用'from_documents'时,我设置了include_embeddings=True。
当我使用索引进行查询时,日志会打印“index not constructed with embeddings”,尽管它已经构建了。
然而,当我使用一个“SimpleGraphStore”并将存储上下文本地持久化时,一切正常,并且利用了嵌入。
我有成千上万的文档,因此本地持久化和为每个请求加载索引是不可行的。如何使用neo4j知识图谱并利用嵌入?

版本

0.9.5

重现步骤

username = "neo4j"
password = "<pw>"
url = "<url>"
database = "neo4j"

graph_store = Neo4jGraphStore(
    username=username,
    password=password,
    url=url,
    database=database,
)

storage_context = StorageContext.from_defaults(graph_store=graph_store)

llm = AzureOpenAI(engine="GPT4x32k", model="gpt-4-32k")

embedding_llm = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name="Embedding",
    api_key=os.getenv('OPENAI_API_KEY'),
    azure_endpoint=os.getenv('AZURE_OPENAI_ENDPOINT'),
    api_version=os.getenv('OPENAI_API_VERSION'),
)

service_context = ServiceContext.from_defaults(embed_model=embedding_llm, llm=llm)

index = KnowledgeGraphIndex.from_documents(
    docs,
    storage_context=storage_context,
    service_context=service_context,
    include_embeddings=True
)

新会话

index = KnowledgeGraphIndex.from_documents(
    [],
    storage_context=storage_context,
    service_context=service_context,
    include_embeddings=True
)

query_engine = index.as_query_engine(
    include_text=True,
    response_mode="tree_summarize",
    embedding_mode="hybrid",
    similarity_top_k=5,
)

response = query_engine.query(
    "<query>"
)

**LOGS:**
WARNING:llama_index.indices.knowledge_graph.retrievers:Index was not constructed with embeddings, skipping embedding usage...

相关日志/回溯

  • 无响应*
zz2j4svz

zz2j4svz1#

  • 🤖*

Hey there @j0sephsasson! Great to see you back. Hope you're doing well today.
Based on the information you've provided, it seems like you're encountering a warning message that says 'index not constructed with embeddings' when querying, even though the index was constructed with embeddings. This warning message is likely due to the fact that the vector stores for text and images are empty. This can happen if no nodes are passed to the _add_nodes_to_index or _async_add_nodes_to_index methods, or if the nodes passed do not contain any text or images.
To resolve this issue, ensure that the nodes you are passing to the index contain either text or images. You can check the nodes by printing them out before passing them to the index. If the nodes do not contain any text or images, you may need to preprocess your data to ensure that it is in the correct format.
Regarding the use of a neo4j knowledge graph with embeddings without persisting locally, the LlamaIndex does not seem to have built-in support for neo4j. However, you could potentially use the neo4j Python driver to fetch data from your neo4j database, convert it into the BaseNode or ImageNode format expected by LlamaIndex, and then pass these nodes to the index.
Here is a simplified example of how you might do this:

from neo4j import GraphDatabase
from llama_index.indices.knowledge_graph import BaseNode

# Initialize the neo4j driver
driver = GraphDatabase.driver("<neo4j_url>", auth=("<username>", "<password>"))

# Fetch data from neo4j
with driver.session() as session:
    result = session.run("MATCH (n) RETURN n")

# Convert the data into BaseNode format
nodes = [BaseNode(id=record["n"].id, text=record["n"]["text"]) for record in result]

# Pass the nodes to the index
index = KnowledgeGraphIndex.from_documents(
    nodes,
    storage_context=storage_context,
    service_context=service_context,
    include_embeddings=True
)

Please note that this would require a good understanding of both the structure of your neo4j database and the LlamaIndex codebase. Also, the above code is a simplified example and may not work as-is depending on the structure of your neo4j database.
I hope this helps! If you have any further questions, please don't hesitate to ask.
For more information, you can refer to the following sources:

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.

7bsow1i6

7bsow1i62#

@dosu-bot
If I specify an external vector store (cog search, for example), and pass the vector & graph store to storage context, then run this code:

index = KnowledgeGraphIndex.from_documents(
    docs,
    storage_context=storage_context,
    service_context=service_context,
    include_embeddings=True
)

Only the graph store is created. Is it possible to construct graph with embeddings and use external stores (not local. like neo4j and cog search)?

sauutmhj

sauutmhj3#

在LlamaIndex的0.9.5版本中,使用neo4j和Azure认知搜索等外部存储库构建知识图谱并不直接受支持。当前的代码仅支持使用GraphStore类创建图形存储库,该类用于将图形数据存储在内存中。
然而,您可以通过创建继承自GraphStore类并覆盖其方法以与外部存储库(如neo4j和Azure认知搜索)进行交互的自定义类来扩展LlamaIndex的功能。以下是创建neo4j自定义类的基本示例:

from llama_index.legacy.graph_stores.types import GraphStore
from neo4j import GraphDatabase

class Neo4jGraphStore(GraphStore):
    def __init__(self, uri, user, password):
        self._driver = GraphDatabase.driver(uri, auth=(user, password))

    def close(self):
        self._driver.close()

    def upsert_triplet(self, subject, relationship, object):
        with self._driver.session() as session:
            session.run("MERGE (a:Subject {name: $subject}) "
                        "MERGE (b:Object {name: $object}) "
                        "MERGE (a)-[r:RELATIONSHIP {name: $relationship}]->(b)",
                        subject=subject, relationship=relationship, object=object)

这个Neo4jGraphStore类连接到neo4j数据库,并覆盖了upsert_triplet方法以将三元组插入到neo4j数据库中。您需要以类似的方式实现GraphStore类的其他方法。
请注意,这是一个简化的示例,可能无法满足您的所有需求。您需要处理错误、管理事务,并根据您的特定用例优化数据库查询。此外,您还需要为Azure认知搜索执行类似的过程。
至于从LlamaIndex代码库的0.9.5版本到最新版本v0.10.3所做的更改,我在仓库中找不到具体的详细信息。这些更改可能在其他地方有文档记录,或者我可能错过了它们。如果您对更改有更具体的问题或需要进一步的帮助,请提供更多详细信息。
您可以在llama-index-legacy/llama_index/legacy/indices/knowledge_graph/base.py文件中找到有关GraphStore类及其方法的更多信息。

来源

关于Dosu

此响应旨在节省您的时间并使其有用。它不是精确解决方案,而是您自己研究的起点。
帮助我变得更有用!如果这有帮助,请留下一个👍,如果无关紧要,请留下一个👎。
如果您想继续对话,请以@dosu-bot开头回复。

3j86kqsm

3j86kqsm4#

@j0sephsasson 我遇到了同样的问题,你能解决吗?

8wtpewkr

8wtpewkr5#

遇到了同样的问题。尝试在neo4j中存储节点和嵌入。

我可以看到日志中嵌入已经生成,但它们似乎没有被存储。

这是我的代码:

from llama_index.core import KnowledgeGraphIndex, StorageContext, ServiceContext
from llama_index.readers.web import SimpleWebPageReader
from llama_index.graph_stores.neo4j import Neo4jGraphStore
from langchain_openai import ChatOpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

documentUrls = ["http://some-random-url"]
openAIApiKey = "MY_OPEN_AI_API_KEY"
LLAMA_GRAPH_STORE = Neo4jGraphStore(
        username="myUsername",
        password="myPassword",
        url="neo4jUrl")

llm= ChatOpenAI(model="gpt-3.5-turbo-16k", openai_api_key = openAIApiKey)
embedLLM = OpenAIEmbedding( model="text-embedding-3-large",api_key=openAIApiKey)

storage_context = StorageContext.from_defaults(graph_store=LLAMA_GRAPH_STORE)
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embedLLM)
documents = SimpleWebPageReader(html_to_text=True).load_data(documentUrls)


KnowledgeGraphIndex.from_documents(
            documents=documents,
            show_progress=True,
            storage_context = storage_context,
            service_context = service_context,
            include_embeddings = True
        )

我可以看到输出中嵌入正在生成,

但在我的本地neo4j中找不到它们。

相关问题