llama_index [Bug]: QdrantVectorStore解析器在响应中总是期望一个名为"text"的键,

vwkv1x7d  于 2个月前  发布在  其他
关注(0)|答案(2)|浏览(33)

Bug描述

parse_to_query_result 方法在 QdrantVectorStore 中总是期望元数据中有一个名为 "text" 的键。如果元数据中没有名为 "text" 的键,这将引发错误。
llama-index/vector_stores/qdrant/base.py 文件的第740行

def parse_to_query_result(self, response: List[Any]) -> VectorStoreQueryResult:
        """
Convert vector store response to VectorStoreQueryResult.

Args:
response: List[Any]: List of results returned from the vector store.
"""
        nodes = []
        similarities = []
        ids = []

        for point in response:
            payload = cast(Payload, point.payload)
            try:
                node = metadata_dict_to_node(payload)
            except Exception:
                metadata, node_info, relationships = legacy_metadata_dict_to_node(
                    payload
                )

                node = TextNode(
                    id_=str(point.id),
                    text=payload.get("text"),  # <----- this should not be hardcoded
                    metadata=metadata,
                    start_char_idx=node_info.get("start", None),
                    end_char_idx=node_info.get("end", None),
                    relationships=relationships,
                )
            nodes.append(node)
            similarities.append(point.score)
            ids.append(str(point.id))

        return VectorStoreQueryResult(nodes=nodes, similarities=similarities, ids=ids)

这种情况不应该发生,应该在 vectorDB 元数据中使用 "text"。用户应该能够通过参数作为文本内容的键名传递,或者应该得到警告而不是解析错误。

版本

0.10.39

重现步骤

  1. 在 Qdrant 中创建一个元数据中没有 "text" 键的集合。
  2. 通过任何查询尝试从集合中检索任何节点。
vec_db_client = qdrant_client.QdrantClient(
    host=QDRANT_HOST,
    port=443,
    https=True,
)

vec_index = VectorStoreIndex.from_vector_store(
    vector_store=QdrantVectorStore(
        client=vec_db_client, collection_name=collection
    )
)

retriever = VectorIndexRetriever(index=vec_index, similarity_top_k=10)

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=postprocessors,
)

query_engine.retrieve("query")  # <---- Raises error

相关日志/回溯

{
	"name": "ValidationError",
	"message": "1 validation error for TextNode
text
none is not an allowed value (type=type_error.none.not_allowed)",
	"stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/vector_stores/qdrant/base.py:757, in QdrantVectorStore.parse_to_query_result(self, response)
756 try:
--> 757     node = metadata_dict_to_node(payload)
758 except Exception:

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/vector_stores/utils.py:70, in metadata_dict_to_node(metadata, text)
69 if node_json is None:
---> 70     raise ValueError(\"Node content not found in metadata dict.\")
72 node: BaseNode

ValueError: Node content not found in metadata dict.

During handling of the above exception, another exception occurred:

ValidationError                           Traceback (most recent call last)
Cell In[6], line 1
----> 1 result = search.query(query=\"query?\", collection=\"test\")

Cell In[1], line 210, in Search.query(self, query, collection)
204 # Query engine is used only for retrieval
205 query_engine = RetrieverQueryEngine(
206     retriever=retriever,
207     node_postprocessors=postprocessors,
208 )
--> 210 retrieved_nodes = query_engine.retrieve(query)
212 logger.info(f\"Retrieved nodes: {retrieved_nodes}\")
214 if len(retrieved_nodes) == 0:

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py:144, in RetrieverQueryEngine.retrieve(self, query_bundle)
143 def retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
--> 144     nodes = self._retriever.retrieve(query_bundle)
145     return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py:274, in Dispatcher.span.<locals>.wrapper(func, instance, args, kwargs)
270 self.span_enter(
271     id_=id_, bound_args=bound_args, instance=instance, parent_id=parent_id
272 )
273 try:
--> 274     result = func(*args, **kwargs)
275 except BaseException as e:
276     self.event(SpanDropEvent(span_id=id_, err_str=str(e)))

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/base/base_retriever.py:244, in BaseRetriever.retrieve(self, str_or_query_bundle)
239 with self.callback_manager.as_trace(\"query\"):
240     with self.callback_manager.event(
241         CBEventType.RETRIEVE,
242         payload={EventPayload.QUERY_STR: query_bundle.query_str},
243     ) as retrieve_event:
--> 244         nodes = self._retrieve(query_bundle)
245         nodes = self._handle_recursive_retrieval(query_bundle, nodes)
246         retrieve_event.on_end(
247             payload={EventPayload.NODES: nodes},
248         )

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py:274, in Dispatcher.span.<locals>.wrapper(func, instance, args, kwargs)
270 self.span_enter(
271     id_=id_, bound_args=bound_args, instance=instance, parent_id=parent_id
272 )
273 try:
--> 274     result = func(*args, **kwargs)
275 except BaseException as e:
276     self.event(SpanDropEvent(span_id=id_, err_str=str(e)))

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py:101, in VectorIndexRetriever._retrieve(self, query_bundle)
95     if query_bundle.embedding is None and len(query_bundle.embedding_strs) > 0:
96         query_bundle.embedding = (
97             self._embed_model.get_agg_embedding_from_queries(
98                 query_bundle.embedding_strs
99             )
100         )
--> 101 return self._get_nodes_with_embeddings(query_bundle)

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py:177, in VectorIndexRetriever._get_nodes_with_embeddings(self, query_bundle_with_embeddings)
173 def _get_nodes_with_embeddings(
174     self, query_bundle_with_embeddings: QueryBundle
175 ) -> List[NodeWithScore]:
176     query = self._build_vector_store_query(query_bundle_with_embeddings)
--> 177     query_result = self._vector_store.query(query, **self._kwargs)
178     return self._build_node_list_from_query_result(query_result)

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/vector_stores/qdrant/base.py:605, in QdrantVectorStore.query(self, query, **kwargs)
598 else:
599     response = self._client.search(
600         collection_name=self.collection_name,
601         query_vector=query_embedding,
602         limit=query.similarity_top_k,
603         query_filter=query_filter,
604     )
--> 605     return self.parse_to_query_result(response)

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/vector_stores/qdrant/base.py:763, in QdrantVectorStore.parse_to_query_result(self, response)
758 except Exception:
759     metadata, node_info, relationships = legacy_metadata_dict_to_node(
760         payload
761     )
--> 763     node = TextNode(
764         id_=str(point.id),
765         text=payload.get(\"text\"),
766         metadata=metadata,
767         start_char_idx=node_info.get(\"start\", None),
768         end_char_idx=node_info.get(\"end\", None),
769         relationships=relationships,
770     )
771 nodes.append(node)
772 similarities.append(point.score)

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/pydantic/v1/main.py:341, in BaseModel.__init__(__pydantic_self__, **data)
339 values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
340 if validation_error:
--> 341     raise validation_error
342 try:
343     object_setattr(__pydantic_self__, '__dict__', values)

ValidationError: 1 validation error for TextNode
text
none is not an allowed value (type=type_error.none.not_allowed)"
}
r7xajy2e

r7xajy2e1#

你好!我在这里帮助你解决遇到的任何问题。如果你有任何问题或需要修复bug的帮助,请随时提问!

要解决QdrantVectorStore类中parse_to_query_result方法在元数据中期望有一个"text"键的问题,你可以修改该方法,在尝试访问它之前检查"text"键是否存在。如果键不存在,你可以优雅地处理它,通过设置默认值或跳过操作。

以下是一个示例修改:

def parse_to_query_result(self, response: List[Any]) -> VectorStoreQueryResult:
    """
Convert vector store response to VectorStoreQueryResult.

Args:
response: List[Any]: List of results returned from the vector store.
"""
    nodes = []
    similarities = []
    ids = []

    for point in response:
        payload = cast(Payload, point.payload)
        try:
            node = metadata_dict_to_node(payload)
        except Exception:
            metadata, node_info, relationships = legacy_metadata_dict_to_node(
                payload
            )

            text = payload.get("text", "Default text")  # Check if "text" key exists
            node = TextNode(
                id_=str(point.id),
                text=text,
                metadata=metadata,
                start_char_idx=node_info.get("start", None),
                end_char_idx=node_info.get("end", None),
                relationships=relationships,
            )
        nodes.append(node)
        ids.append(str(point.id))
        try:
            similarities.append(point.score)
        except AttributeError:
            # certain requests do not return a score
            similarities.append(1.0)

    return VectorStoreQueryResult(nodes=nodes, similarities=similarities, ids=ids)

在这个修改中,payload.get("text", "Default text")行检查负载中是否存在"text"键。如果不存在,它将text设置为"Default text"。这种方法确保方法通过提供默认值1(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/vector_store/retrievers/retriever.py)2(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/vector_stores/utils.py)3(https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/vector_stores/llama-index-vector-stores-qdrant/llama_index/vector_stores/qdrant/base.py)来优雅地处理缺失的"text"键。

  • 要继续与Dosu交流,请提及@dosu。*
xzlaal3s

xzlaal3s2#

我遇到了相同的错误:

ValidationError: 1 validation error for TextNode
text
  none is not an allowed value (type=type_error.none.not_allowed)

我在几周前创建了一个VectoreStoreIndex,并从之前创建的集合中获取数据。经过几个小时的调试和尝试一些操作后,我意识到在qdrant中使用openai嵌入创建了一个新的集合后,解决了这个错误。我没有深入研究到底是什么导致了这个bug,但我猜测qdrant向量存储的创建方式发生了变化,这与旧版本创建嵌入的方式不匹配。
我不确定这是否有意义,但它解决了错误。如果有人遇到相同的问题,请分享一下。

相关问题