llama_index [Bug]: QdrantVectorStore解析器在响应中总是期望一个名为"text"的键,

vwkv1x7d  于 8个月前  发布在  其他
关注(0)|答案(2)|浏览(74)

Bug描述

parse_to_query_result 方法在 QdrantVectorStore 中总是期望元数据中有一个名为 "text" 的键。如果元数据中没有名为 "text" 的键,这将引发错误。
llama-index/vector_stores/qdrant/base.py 文件的第740行

  1. def parse_to_query_result(self, response: List[Any]) -> VectorStoreQueryResult:
  2. """
  3. Convert vector store response to VectorStoreQueryResult.
  4. Args:
  5. response: List[Any]: List of results returned from the vector store.
  6. """
  7. nodes = []
  8. similarities = []
  9. ids = []
  10. for point in response:
  11. payload = cast(Payload, point.payload)
  12. try:
  13. node = metadata_dict_to_node(payload)
  14. except Exception:
  15. metadata, node_info, relationships = legacy_metadata_dict_to_node(
  16. payload
  17. )
  18. node = TextNode(
  19. id_=str(point.id),
  20. text=payload.get("text"), # <----- this should not be hardcoded
  21. metadata=metadata,
  22. start_char_idx=node_info.get("start", None),
  23. end_char_idx=node_info.get("end", None),
  24. relationships=relationships,
  25. )
  26. nodes.append(node)
  27. similarities.append(point.score)
  28. ids.append(str(point.id))
  29. return VectorStoreQueryResult(nodes=nodes, similarities=similarities, ids=ids)

这种情况不应该发生,应该在 vectorDB 元数据中使用 "text"。用户应该能够通过参数作为文本内容的键名传递,或者应该得到警告而不是解析错误。

版本

0.10.39

重现步骤

  1. 在 Qdrant 中创建一个元数据中没有 "text" 键的集合。
  2. 通过任何查询尝试从集合中检索任何节点。
  1. vec_db_client = qdrant_client.QdrantClient(
  2. host=QDRANT_HOST,
  3. port=443,
  4. https=True,
  5. )
  6. vec_index = VectorStoreIndex.from_vector_store(
  7. vector_store=QdrantVectorStore(
  8. client=vec_db_client, collection_name=collection
  9. )
  10. )
  11. retriever = VectorIndexRetriever(index=vec_index, similarity_top_k=10)
  12. query_engine = RetrieverQueryEngine(
  13. retriever=retriever,
  14. node_postprocessors=postprocessors,
  15. )
  16. query_engine.retrieve("query") # <---- Raises error

相关日志/回溯

  1. {
  2. "name": "ValidationError",
  3. "message": "1 validation error for TextNode
  4. text
  5. none is not an allowed value (type=type_error.none.not_allowed)",
  6. "stack": "---------------------------------------------------------------------------
  7. ValueError Traceback (most recent call last)
  8. File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/vector_stores/qdrant/base.py:757, in QdrantVectorStore.parse_to_query_result(self, response)
  9. 756 try:
  10. --> 757 node = metadata_dict_to_node(payload)
  11. 758 except Exception:
  12. File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/vector_stores/utils.py:70, in metadata_dict_to_node(metadata, text)
  13. 69 if node_json is None:
  14. ---> 70 raise ValueError(\"Node content not found in metadata dict.\")
  15. 72 node: BaseNode
  16. ValueError: Node content not found in metadata dict.
  17. During handling of the above exception, another exception occurred:
  18. ValidationError Traceback (most recent call last)
  19. Cell In[6], line 1
  20. ----> 1 result = search.query(query=\"query?\", collection=\"test\")
  21. Cell In[1], line 210, in Search.query(self, query, collection)
  22. 204 # Query engine is used only for retrieval
  23. 205 query_engine = RetrieverQueryEngine(
  24. 206 retriever=retriever,
  25. 207 node_postprocessors=postprocessors,
  26. 208 )
  27. --> 210 retrieved_nodes = query_engine.retrieve(query)
  28. 212 logger.info(f\"Retrieved nodes: {retrieved_nodes}\")
  29. 214 if len(retrieved_nodes) == 0:
  30. File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py:144, in RetrieverQueryEngine.retrieve(self, query_bundle)
  31. 143 def retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
  32. --> 144 nodes = self._retriever.retrieve(query_bundle)
  33. 145 return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)
  34. File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py:274, in Dispatcher.span.<locals>.wrapper(func, instance, args, kwargs)
  35. 270 self.span_enter(
  36. 271 id_=id_, bound_args=bound_args, instance=instance, parent_id=parent_id
  37. 272 )
  38. 273 try:
  39. --> 274 result = func(*args, **kwargs)
  40. 275 except BaseException as e:
  41. 276 self.event(SpanDropEvent(span_id=id_, err_str=str(e)))
  42. File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/base/base_retriever.py:244, in BaseRetriever.retrieve(self, str_or_query_bundle)
  43. 239 with self.callback_manager.as_trace(\"query\"):
  44. 240 with self.callback_manager.event(
  45. 241 CBEventType.RETRIEVE,
  46. 242 payload={EventPayload.QUERY_STR: query_bundle.query_str},
  47. 243 ) as retrieve_event:
  48. --> 244 nodes = self._retrieve(query_bundle)
  49. 245 nodes = self._handle_recursive_retrieval(query_bundle, nodes)
  50. 246 retrieve_event.on_end(
  51. 247 payload={EventPayload.NODES: nodes},
  52. 248 )
  53. File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py:274, in Dispatcher.span.<locals>.wrapper(func, instance, args, kwargs)
  54. 270 self.span_enter(
  55. 271 id_=id_, bound_args=bound_args, instance=instance, parent_id=parent_id
  56. 272 )
  57. 273 try:
  58. --> 274 result = func(*args, **kwargs)
  59. 275 except BaseException as e:
  60. 276 self.event(SpanDropEvent(span_id=id_, err_str=str(e)))
  61. File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py:101, in VectorIndexRetriever._retrieve(self, query_bundle)
  62. 95 if query_bundle.embedding is None and len(query_bundle.embedding_strs) > 0:
  63. 96 query_bundle.embedding = (
  64. 97 self._embed_model.get_agg_embedding_from_queries(
  65. 98 query_bundle.embedding_strs
  66. 99 )
  67. 100 )
  68. --> 101 return self._get_nodes_with_embeddings(query_bundle)
  69. File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py:177, in VectorIndexRetriever._get_nodes_with_embeddings(self, query_bundle_with_embeddings)
  70. 173 def _get_nodes_with_embeddings(
  71. 174 self, query_bundle_with_embeddings: QueryBundle
  72. 175 ) -> List[NodeWithScore]:
  73. 176 query = self._build_vector_store_query(query_bundle_with_embeddings)
  74. --> 177 query_result = self._vector_store.query(query, **self._kwargs)
  75. 178 return self._build_node_list_from_query_result(query_result)
  76. File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/vector_stores/qdrant/base.py:605, in QdrantVectorStore.query(self, query, **kwargs)
  77. 598 else:
  78. 599 response = self._client.search(
  79. 600 collection_name=self.collection_name,
  80. 601 query_vector=query_embedding,
  81. 602 limit=query.similarity_top_k,
  82. 603 query_filter=query_filter,
  83. 604 )
  84. --> 605 return self.parse_to_query_result(response)
  85. File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/vector_stores/qdrant/base.py:763, in QdrantVectorStore.parse_to_query_result(self, response)
  86. 758 except Exception:
  87. 759 metadata, node_info, relationships = legacy_metadata_dict_to_node(
  88. 760 payload
  89. 761 )
  90. --> 763 node = TextNode(
  91. 764 id_=str(point.id),
  92. 765 text=payload.get(\"text\"),
  93. 766 metadata=metadata,
  94. 767 start_char_idx=node_info.get(\"start\", None),
  95. 768 end_char_idx=node_info.get(\"end\", None),
  96. 769 relationships=relationships,
  97. 770 )
  98. 771 nodes.append(node)
  99. 772 similarities.append(point.score)
  100. File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/pydantic/v1/main.py:341, in BaseModel.__init__(__pydantic_self__, **data)
  101. 339 values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
  102. 340 if validation_error:
  103. --> 341 raise validation_error
  104. 342 try:
  105. 343 object_setattr(__pydantic_self__, '__dict__', values)
  106. ValidationError: 1 validation error for TextNode
  107. text
  108. none is not an allowed value (type=type_error.none.not_allowed)"
  109. }
r7xajy2e

r7xajy2e1#

你好!我在这里帮助你解决遇到的任何问题。如果你有任何问题或需要修复bug的帮助,请随时提问!

要解决QdrantVectorStore类中parse_to_query_result方法在元数据中期望有一个"text"键的问题,你可以修改该方法,在尝试访问它之前检查"text"键是否存在。如果键不存在,你可以优雅地处理它,通过设置默认值或跳过操作。

以下是一个示例修改:

  1. def parse_to_query_result(self, response: List[Any]) -> VectorStoreQueryResult:
  2. """
  3. Convert vector store response to VectorStoreQueryResult.
  4. Args:
  5. response: List[Any]: List of results returned from the vector store.
  6. """
  7. nodes = []
  8. similarities = []
  9. ids = []
  10. for point in response:
  11. payload = cast(Payload, point.payload)
  12. try:
  13. node = metadata_dict_to_node(payload)
  14. except Exception:
  15. metadata, node_info, relationships = legacy_metadata_dict_to_node(
  16. payload
  17. )
  18. text = payload.get("text", "Default text") # Check if "text" key exists
  19. node = TextNode(
  20. id_=str(point.id),
  21. text=text,
  22. metadata=metadata,
  23. start_char_idx=node_info.get("start", None),
  24. end_char_idx=node_info.get("end", None),
  25. relationships=relationships,
  26. )
  27. nodes.append(node)
  28. ids.append(str(point.id))
  29. try:
  30. similarities.append(point.score)
  31. except AttributeError:
  32. # certain requests do not return a score
  33. similarities.append(1.0)
  34. return VectorStoreQueryResult(nodes=nodes, similarities=similarities, ids=ids)

在这个修改中,payload.get("text", "Default text")行检查负载中是否存在"text"键。如果不存在,它将text设置为"Default text"。这种方法确保方法通过提供默认值1(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/indices/vector_store/retrievers/retriever.py)2(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/vector_stores/utils.py)3(https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/vector_stores/llama-index-vector-stores-qdrant/llama_index/vector_stores/qdrant/base.py)来优雅地处理缺失的"text"键。

  • 要继续与Dosu交流,请提及@dosu。*
展开查看全部
xzlaal3s

xzlaal3s2#

我遇到了相同的错误:

  1. ValidationError: 1 validation error for TextNode
  2. text
  3. none is not an allowed value (type=type_error.none.not_allowed)

我在几周前创建了一个VectoreStoreIndex,并从之前创建的集合中获取数据。经过几个小时的调试和尝试一些操作后,我意识到在qdrant中使用openai嵌入创建了一个新的集合后,解决了这个错误。我没有深入研究到底是什么导致了这个bug,但我猜测qdrant向量存储的创建方式发生了变化,这与旧版本创建嵌入的方式不匹配。
我不确定这是否有意义,但它解决了错误。如果有人遇到相同的问题,请分享一下。

相关问题