llama_index [Bug]:在Neo4jPropertyGraphStore中,Neo4j驱动程序初始化效率低且缺少连接关闭,

js81xvg6  于 2个月前  发布在  其他
关注(0)|答案(2)|浏览(20)

Bug描述

Llama Index中的Neo4jPropertyGraphStore类的当前实现在类构造函数中初始化Neo4j驱动程序。根据Neo4j documentation,驱动程序对象是“不可变的、线程安全的,创建成本高。因此,应用程序应该只创建一个驱动程序示例并将其传递给其他地方,而不是多次初始化它。”
此外,该类没有正确关闭驱动程序连接,这可能导致资源泄漏和其他问题随着时间的推移而出现。

相关代码:

class Neo4jPropertyGraphStore(PropertyGraphStore):

    ...
    
    def __init__(
        self,
        username: str,
        password: str,
        url: str,
        database: Optional[str] = "neo4j",
        refresh_schema: bool = True,
        sanitize_query_output: bool = True,
        enhanced_schema: bool = False,
        **neo4j_kwargs: Any,
    ) -> None:
        self.sanitize_query_output = sanitize_query_output
        self.enhanced_schema = enhanced_schema
        self._driver = neo4j.GraphDatabase.driver(
            url, auth=(username, password), **neo4j_kwargs
        )
        self._async_driver = neo4j.AsyncGraphDatabase.driver(
            url,
            auth=(username, password),
            **neo4j_kwargs,
        )
        self._database = database
        self.structured_schema = {}
        if refresh_schema:
            self.refresh_schema()

问题:

  • 低效的初始化:当前的设计是在类构造函数中初始化Neo4j驱动程序。这可能导致创建多个驱动程序示例,这是低效的,不符合Neo4j的最佳实践。
  • 缺乏适当的关闭方法:该类没有包含用于正确关闭Neo4j驱动程序的方法,这可能导致资源泄漏。然而,关闭驱动程序应该是开发人员的职责,而不是类本身的责任(如果考虑了建议的更改)。
    建议的解决方案:
  • 将Driver作为参数传递:修改Neo4jPropertyGraphStore以接受已初始化的Neo4j驱动程序作为参数。这允许在类的多个示例之间重用单个驱动程序示例。
  • 将Driver管理留给开发者:在文档中强调关闭驱动程序的责任应由开发人员处理。
    建议的代码更改:
  1. 修改构造函数以接受驱动程序作为参数:
def __init__(
    self,
    driver: neo4j.Driver,
    async_driver: neo4j.AsyncDriver,
    database: Optional[str] = "neo4j",
    refresh_schema: bool = True,
    sanitize_query_output: bool = True,
    enhanced_schema: bool = False,
) -> None:
    self.sanitize_query_output = sanitize_query_output
    self.enhanced_schema = enhanced_schema
    self._driver = driver
    self._async_driver = async_driver
    self._database = database
    self.structured_schema = {}
    if refresh_schema:
        self.refresh_schema()

示例用法:

from neo4j import GraphDatabase
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore

# Create a single driver instance
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "neo4j"))
async_driver = neo4j.AsyncGraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "neo4j"))

# Pass the driver instance to the Neo4jPropertyGraphStore
graph_store = Neo4jPropertyGraphStore(
    driver=driver,
    async_driver=async_driver,
    database="neo4j"
)

# Ensure to close the driver when the application is shutting down
driver.close()
async_driver.close()

结论:

通过进行这些更改,Neo4jPropertyGraphStore类将与Neo4j的最佳实践保持一致,提高效率和资源管理。管理驱动程序生命周期(包括其关闭)的责任应留给开发人员。我希望您考虑将来的更新采用这些建议。

版本

0.10.57

重现步骤

  1. 在应用程序中示例化Neo4jPropertyGraphStore类多次。
  2. 观察多个Neo4j驱动程序示例被创建的情况。
  3. 注意没有任何方法可以正确关闭驱动程序连接。

相关日志/回溯

  • 无响应*
zmeyuzjn

zmeyuzjn1#

@tareksanger 你好!我在这里协助你解决任何问题。我可以帮你解决bug,回答问题,甚至指导你如何进行贡献。在等待人工维护人员处理你的关注点时,请随意提问。

为了解决Neo4jPropertyGraphStore类中低效的Neo4j驱动程序初始化和缺失连接关闭的问题,你可以修改该类以接受一个已经初始化的Neo4j驱动程序作为参数。这种方法符合Neo4j的最佳实践,并确保资源管理得当。

以下是提议对Neo4jPropertyGraphStore类的修改:

修改后的 Neo4jPropertyGraphStore

class Neo4jPropertyGraphStore(PropertyGraphStore):
    def __init__(
        self,
        driver: neo4j.Driver,
        async_driver: neo4j.AsyncDriver,
        database: Optional[str] = "neo4j",
        refresh_schema: bool = True,
        sanitize_query_output: bool = True,
        enhanced_schema: bool = False,
    ) -> None:
        self.sanitize_query_output = sanitize_query_output
        self.enhanced_schema = enhanced_schema
        self._driver = driver
        self._async_driver = async_driver
        self._database = database
        self.structured_schema = {}
        if refresh_schema:
            self.refresh_schema()

    @property
    def client(self):
        return self._driver

    def refresh_schema(self) -> None:
        """Refresh the schema."""
        node_properties = [el["output"] for el in self.query(node_properties_query)]
        rel_properties = [el["output"] for el in self.query(rel_properties_query)]
        relationships = [el["output"] for el in self.query(rel_query)]

        self.structured_schema = {
            "node_props": {el["labels"]: el["properties"] for el in node_properties},
            "rel_props": {el["type"]: el["properties"] for el in rel_properties},
            "relationships": relationships,
        }

        # Format node properties
        formatted_node_props = []
        for el in node_properties:
            props_str = ", ".join(
                [f"{prop['property']}: {prop['type']}" for prop in el["properties"]]
            )
            formatted_node_props.append(f"{el['labels']} {{{props_str}}}")

        # Format relationship properties
        formatted_rel_props = []
        for el in rel_properties:
            props_str = ", ".join(
                [f"{prop['property']}: {prop['type']}" for prop in el["properties"]]
            )
            formatted_rel_props.append(f"{el['type']} {{{props_str}}}")

        # Format relationships
        formatted_rels = [
            f"(:{el['start']})-[:{el['type']}]->(:{el['end']})" for el in relationships
        ]

        self.schema = "\n".join(
            [
                "Node properties are the following:",
                ",".join(formatted_node_props),
                "Relationship properties are the following:",
                ",".join(formatted_rel_props),
                "The relationships are the following:",
                ",".join(formatted_rels),
            ]
        )

    def query(self, query: str, param_map: Optional[Dict[str, Any]] = {}) -> Any:
        with self._driver.session(database=self._database) as session:
            result = session.run(query, param_map)
            return [d.data() for d in result]

示例用法

from neo4j import GraphDatabase
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore

# Create a single driver instance
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "neo4j"))
async_driver = neo4j.AsyncGraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "neo4j"))

# Pass the driver instance to the Neo4jPropertyGraphStore
graph_store = Neo4jPropertyGraphStore(
    driver=driver,
    async_driver=async_driver,
    database="neo4j"
)

# Ensure to close the driver when the application is shutting down
driver.close()
async_driver.close()

结论

通过进行这些更改,Neo4jPropertyGraphStore类将与Neo4j的最佳实践保持一致,提高效率和资源管理。管理驱动程序生命周期(包括其关闭)的责任应由开发人员 1(https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/graph_stores/llama-index-graph-stores-neo4j/llama_index/graph_stores/neo4j/base.py) 负责。

  • 要继续与 Dosu 交谈,请提及 @dosu。*
2j4z5cfb

2j4z5cfb2#

Feel free to open a pr! 👍

相关问题