Bug描述
当在 DSPy 模块内部使用从 ChromaVectorStore
创建的 VectorIndexRetriever
时,编译该模块会失败。问题的根源似乎是 DSPy 在运行模块之前对其进行了深拷贝,然而,在深拷贝过程中,ChromaVectorStore
的 _collection
属性没有被复制。
这里是修复此问题的猴子补丁:
def mydeepcopy(self, memo):
return self
import llama_index
llama_index.vector_stores.chroma.ChromaVectorStore.__deepcopy__ = mydeepcopy
版本
llama-index==0.10.51; llama-index-vector-stores-chroma==0.1.10
重现步骤
最小可复现示例:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
from copy import deepcopy
db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("my_collection")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
print(vector_store._collection)
c = deepcopy(vector_store)
print(c._collection)
与 DSPy 集成的示例:
class Rag(dspy.Module):
def __init__(self):
super().__init__()
db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("my_collection")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex.from_vector_store(vector_store)
self.vector_retriever = index.as_retriever(similarity_top_k=5)
def forward(self, question):
nodes = self.vector_retriever.retrieve(question)
return dspy.Prediction(answer=str(nodes))
相关日志/回溯
Traceback (most recent call last):
File "/code/./query.py", line 262, in <module>
main()
File "/code/./query.py", line 228, in main
rag = teleprompter.compile(rag_with_assertions, trainset=trainset)
File "/code/.venv/lib/python3.10/site-packages/dspy/teleprompt/bootstrap.py", line 84, in compile
self._bootstrap()
File "/code/.venv/lib/python3.10/site-packages/dspy/teleprompt/bootstrap.py", line 154, in _bootstrap
success = self._bootstrap_one_example(example, round_idx)
File "/code/.venv/lib/python3.10/site-packages/dspy/teleprompt/bootstrap.py", line 210, in _bootstrap_one_example
raise e
File "/code/.venv/lib/python3.10/site-packages/dspy/teleprompt/bootstrap.py", line 190, in _bootstrap_one_example
prediction = teacher(**example.inputs())
File "/code/.venv/lib/python3.10/site-packages/dspy/primitives/program.py", line 26, in __call__
return self.forward(*args, **kwargs)
File "/code/.venv/lib/python3.10/site-packages/dspy/primitives/assertions.py", line 294, in forward
return wrapped_forward(*args, **kwargs)
File "/code/.venv/lib/python3.10/site-packages/dspy/primitives/assertions.py", line 215, in wrapper
result = bypass_suggest_handler(func)(*args, **kwargs) if bypass_suggest else None
File "/code/.venv/lib/python3.10/site-packages/dspy/primitives/assertions.py", line 148, in wrapper
return func(*args, **kwargs)
File "/code/./query.py", line 161, in forward
nodes = retriever.retrieve(question)
File "/code/.venv/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapper
result = func(*args, **kwargs)
File "/code/.venv/lib/python3.10/site-packages/llama_index/core/base/base_retriever.py", line 243, in retrieve
nodes = self._retrieve(query_bundle)
File "/code/.venv/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 230, in wrapper
result = func(*args, **kwargs)
File "/code/.venv/lib/python3.10/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py", line 101, in _retrieve
return self._get_nodes_with_embeddings(query_bundle)
File "/code/.venv/lib/python3.10/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py", line 101, in [189/1909]
return self._get_nodes_with_embeddings(query_bundle)
File "/code/.venv/lib/python3.10/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py", line 177, in _get_nodes
_with_embeddings
query_result = self._vector_store.query(query, **self._kwargs)
File "/code/.venv/lib/python3.10/site-packages/llama_index/vector_stores/chroma/base.py", line 371, in query
return self._query(
File "/code/.venv/lib/python3.10/site-packages/llama_index/vector_stores/chroma/base.py", line 381, in _query
results = self._collection.query(
AttributeError: 'ChromaVectorStore' object has no attribute '_collection'. Did you mean: 'from_collection'?
2条答案
按热度按时间fkvaft9z1#
@theta-lin 会喜欢一个除了猴子补丁之外的修复建议。
mutmk8jj2#
@logan-markewich 的确,monkeypatching只是我的临时解决方案。
经过进一步调查,我认为根本原因是
sqlite3.Connection
对象不可pickle化,这很有道理,因为数据库连接应该是共享的,而不是被复制。这个问题可以通过以下代码来说明:它给出了错误信息
TypeError: cannot pickle 'sqlite3.Connection' object
。虽然我理解您可能通过提供一个自定义的
__deepcopy__()
方法来改变copy.deepcopy()
的默认行为,但我不确定在这里直接覆盖它是否是一个好的方法,因为ChromaVectorStore
是一个Pydantic对象。此外,当实际的ChromaVectorStore
对象被pickle时,上面提到的错误信息并没有显示出来,所以我认为可能已经修改了一些默认的复制行为?那么您认为在这种情况下修复它的最佳方法是什么呢?