llama_index [Bug]: MetadataFilters FilterOperator.IN 对 Chromadb 不起作用

wztqucjr  于 2个月前  发布在  其他
关注(0)|答案(4)|浏览(41)

Bug Description

我使用Chromadb创建了一个向量存储,并从中创建了一个检索器。我想做的是从语义搜索和元过滤器中检索两者。过滤器对于EQ有效,但对于IN无效。IN应该起作用吗?我尝试调用_to_chroma_filter(filters)方法,但似乎该方法已被弃用。
我的文档或节点没有问题,因为正如我所说,EQ过滤器是有效的。

Version

0.10.28

Steps to Reproduce

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
sentence_index = VectorStoreIndex(nodes, storage_context=storage_context)
desired_ids = ['10000032', '10000764']
filters = MetadataFilters(
filters=[
MetadataFilter(key="subject_id", operator=FilterOperator.IN, value=desired_ids),
],
)
retriever = sentence_index.as_retriever(filters=filters)
retriever.retrieve("Find all patients")
错误是:意外的异常格式化异常。回退到标准异常

Relevant Logs/Tracebacks

Traceback (most recent call last):
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3505, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_33278/3927084781.py", line 14, in <module>
    retriever.retrieve("Find all patients")
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 102, in wrapper
    def set_current_span_id(self, value: str):
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 77, in span_drop
    span_handlers: List[BaseSpanHandler] = [],
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/span_handlers/base.py", line 45, in span_drop
    def class_name(cls) -> str:
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/span_handlers/null.py", line 33, in prepare_to_drop_span
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 100, in wrapper
    return None
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/base/base_retriever.py", line 238, in retrieve
    query_bundle = str_or_query_bundle
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 102, in wrapper
    def set_current_span_id(self, value: str):
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 77, in span_drop
    span_handlers: List[BaseSpanHandler] = [],
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/span_handlers/base.py", line 45, in span_drop
    def class_name(cls) -> str:
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/span_handlers/null.py", line 33, in prepare_to_drop_span
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 100, in wrapper
    return None
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py", line 101, in _retrieve
    return self._get_nodes_with_embeddings(query_bundle)
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py", line 176, in _get_nodes_with_embeddings
    query = self._build_vector_store_query(query_bundle_with_embeddings)
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/vector_stores/chroma/base.py", line 290, in query
    where = _to_chroma_filter(query.filters)
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/vector_stores/chroma/base.py", line 69, in _to_chroma_filter
    _transform_chroma_filter_operator(
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/vector_stores/chroma/base.py", line 52, in _transform_chroma_filter_operator
    raise ValueError(f"Filter operator {operator} not supported")
ValueError: Filter operator in not supported

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 2102, in showtraceback
    stb = self.InteractiveTB.structured_traceback(
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1310, in structured_traceback
    return FormattedTB.structured_traceback(
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1199, in structured_traceback
    return VerboseTB.structured_traceback(
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1052, in structured_traceback
    formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/IPython/core/ultratb.py", line 978, in format_exception_as_a_whole
    frames.append(self.format_record(record))
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/IPython/core/ultratb.py", line 878, in format_record
    frame_info.lines, Colors, self.has_colors, lvals
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/IPython/core/ultratb.py", line 712, in lines
    return self._sd.lines
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/stack_data/core.py", line 734, in lines
    pieces = self.included_pieces
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/stack_data/core.py", line 681, in included_pieces
    pos = scope_pieces.index(self.executing_piece)
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/stack_data/core.py", line 660, in executing_piece
    return only(
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/executing/executing.py", line 116, in only
    raise NotOneValueFound('Expected one value, found 0')
executing.executing.NotOneValueFound: Expected one value, found 0
but5z9lq

but5z9lq1#

你在使用Chromadb时遇到的FilterOperator.IN无法工作的问题,是因为LlamaIndex ChromaDB集成中的_transform_chroma_filter_operator函数不支持IN操作符。这个函数将标准元数据过滤操作符转换为ChromaDB特定的规范,目前尚不支持IN操作符,导致了你看到的ValueError

要解决这个问题,你可以修改_transform_chroma_filter_operator函数,以便包括对IN操作符的支持,前提是你熟悉ChromaDB的查询语言。以下是一个示例修改:

def _transform_chroma_filter_operator(operator: str) -> str:
if operator == "IN":
return "$in"
# Include other operators and their translations

这个修改假设ChromaDB在IN操作符上使用类似于MongoDB的语法。你还需要确保代码能够正确处理filter.value是值列表的情况,因为IN操作符适用于多个值。

请记住,这是一个解决方法。为了永久解决问题,可以考虑联系LlamaIndex团队,请求他们在他们的ChromaDB集成中为IN操作符提供官方支持。

来源

关于Dosu

此响应旨在帮助你节省时间并提高效率。它不是精确的解决方案,而是你自己研究的一个起点。
请帮助我变得更有用!如果你觉得这有帮助,请留下一个👍;如果无关紧要,请留下一个👎。
如果你想继续讨论,请以@dosu-bot开头回复。

vsdwdz23

vsdwdz232#

根据错误提示,确实不支持
如果chroma支持这个功能,应该在这里添加
llama_index/llama-index-integrations/vector_stores/llama-index-vector-stores-chroma/llama_index/vector_stores/chroma/base.py
第37行 in f1124cf
| | def_transform_chroma_filter_operator(operator: str) ->str: |

nqwrtyyt

nqwrtyyt3#

看起来在chromadb中支持'IN'操作符。https://docs.trychroma.com/guides#using-inclusion-operators-(-and-)
请问您能检查一下吗?

z6psavjg

z6psavjg4#

这个问题现在可以关闭了。

相关问题