llama_index [Bug]:在使用DSPy时,ChromaVectorStore的_collection属性在深拷贝过程中未被复制,

ohtdti5x  于 1个月前  发布在  其他
关注(0)|答案(2)|浏览(29)

Bug描述

当在 DSPy 模块内部使用从 ChromaVectorStore 创建的 VectorIndexRetriever 时,编译该模块会失败。问题的根源似乎是 DSPy 在运行模块之前对其进行了深拷贝,然而,在深拷贝过程中,ChromaVectorStore_collection 属性没有被复制。

这里是修复此问题的猴子补丁:

def mydeepcopy(self, memo):
    return self

import llama_index
llama_index.vector_stores.chroma.ChromaVectorStore.__deepcopy__ = mydeepcopy

版本

llama-index==0.10.51; llama-index-vector-stores-chroma==0.1.10

重现步骤

最小可复现示例:

import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
from copy import deepcopy

db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("my_collection")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
print(vector_store._collection)

c = deepcopy(vector_store)
print(c._collection)

与 DSPy 集成的示例:

class Rag(dspy.Module):
    def __init__(self):
        super().__init__()

        db = chromadb.PersistentClient(path="./chroma_db")
        chroma_collection = db.get_or_create_collection("my_collection")
        vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
        index = VectorStoreIndex.from_vector_store(vector_store)
        self.vector_retriever = index.as_retriever(similarity_top_k=5)

    def forward(self, question):
        nodes = self.vector_retriever.retrieve(question)
        return dspy.Prediction(answer=str(nodes))

相关日志/回溯

Traceback (most recent call last):
  File "/code/./query.py", line 262, in <module>
    main()
  File "/code/./query.py", line 228, in main
    rag = teleprompter.compile(rag_with_assertions, trainset=trainset)
  File "/code/.venv/lib/python3.10/site-packages/dspy/teleprompt/bootstrap.py", line 84, in compile
    self._bootstrap()
  File "/code/.venv/lib/python3.10/site-packages/dspy/teleprompt/bootstrap.py", line 154, in _bootstrap
    success = self._bootstrap_one_example(example, round_idx)
  File "/code/.venv/lib/python3.10/site-packages/dspy/teleprompt/bootstrap.py", line 210, in _bootstrap_one_example
    raise e
  File "/code/.venv/lib/python3.10/site-packages/dspy/teleprompt/bootstrap.py", line 190, in _bootstrap_one_example
    prediction = teacher(**example.inputs())
  File "/code/.venv/lib/python3.10/site-packages/dspy/primitives/program.py", line 26, in __call__
    return self.forward(*args, **kwargs)
  File "/code/.venv/lib/python3.10/site-packages/dspy/primitives/assertions.py", line 294, in forward
    return wrapped_forward(*args, **kwargs)
  File "/code/.venv/lib/python3.10/site-packages/dspy/primitives/assertions.py", line 215, in wrapper
    result = bypass_suggest_handler(func)(*args, **kwargs) if bypass_suggest else None
  File "/code/.venv/lib/python3.10/site-packages/dspy/primitives/assertions.py", line 148, in wrapper
    return func(*args, **kwargs)
  File "/code/./query.py", line 161, in forward
    nodes = retriever.retrieve(question)
  File "/code/.venv/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapper
    result = func(*args, **kwargs)
  File "/code/.venv/lib/python3.10/site-packages/llama_index/core/base/base_retriever.py", line 243, in retrieve
    nodes = self._retrieve(query_bundle)
  File "/code/.venv/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapper
    result = func(*args, **kwargs)
  File "/code/.venv/lib/python3.10/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py", line 101, in _retrieve
    return self._get_nodes_with_embeddings(query_bundle)
  File "/code/.venv/lib/python3.10/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py", line 101, in [189/1909]
    return self._get_nodes_with_embeddings(query_bundle)
  File "/code/.venv/lib/python3.10/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py", line 177, in _get_nodes
_with_embeddings
    query_result = self._vector_store.query(query, **self._kwargs)
  File "/code/.venv/lib/python3.10/site-packages/llama_index/vector_stores/chroma/base.py", line 371, in query
    return self._query(
  File "/code/.venv/lib/python3.10/site-packages/llama_index/vector_stores/chroma/base.py", line 381, in _query
    results = self._collection.query(
AttributeError: 'ChromaVectorStore' object has no attribute '_collection'. Did you mean: 'from_collection'?
fkvaft9z

fkvaft9z1#

@theta-lin 会喜欢一个除了猴子补丁之外的修复建议。

mutmk8jj

mutmk8jj2#

@logan-markewich 的确,monkeypatching只是我的临时解决方案。
经过进一步调查,我认为根本原因是sqlite3.Connection对象不可pickle化,这很有道理,因为数据库连接应该是共享的,而不是被复制。这个问题可以通过以下代码来说明:

class C(BaseModel):
    _collection = PrivateAttr()

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        db = chromadb.PersistentClient(path="./chroma_db")
        self._collection = db.get_or_create_collection("my_collection")

c = C()
print(c._collection)

c_copy = deepcopy(c)
print(c_copy._collection)

它给出了错误信息TypeError: cannot pickle 'sqlite3.Connection' object
虽然我理解您可能通过提供一个自定义的__deepcopy__()方法来改变copy.deepcopy()的默认行为,但我不确定在这里直接覆盖它是否是一个好的方法,因为ChromaVectorStore是一个Pydantic对象。此外,当实际的ChromaVectorStore对象被pickle时,上面提到的错误信息并没有显示出来,所以我认为可能已经修改了一些默认的复制行为?那么您认为在这种情况下修复它的最佳方法是什么呢?

相关问题