llama_index [Bug]: 如何第二次使用相同的节点进行dense_x,请提供标准解决方案以解决此错误,

laik7k3q  于 2个月前  发布在  其他
关注(0)|答案(1)|浏览(39)

在使用chromadb嵌入进行第二次查询时,当我们输入一个问题(用户输入的问题)以生成响应时,会出现错误:
Query id 40b9d01f-b211-413d-b3d4-a799eff700d6 not found in either retriever_dict or query_engine_dict.
如何解决这个错误?
详细解释:
对于第一次使用新的chromadb嵌入生成的情况,它可以像从llm获取用户输入的问题并生成响应一样成功地执行端到端。但是当我们停止执行并再次重新执行时,就会面临这个queryID错误。

版本

llama-index==0.10.12

重现步骤

`class DenseXRetrievalPack(BaseLlamaPack):
def init(
self,
documents: List[Document],
proposition_llm: Optional[LLM] = None,
query_llm: Optional[LLM] = None,
embed_model: Optional[BaseEmbedding] = None,
text_splitter: TextSplitter = SentenceSplitter(),
vector_store: Optional[ElasticsearchStore] = None,
similarity_top_k: int = 4,
) -> None:
"""Init params."""
self._proposition_llm = llm

embed_model = embed_model

nodes = text_splitter.get_nodes_from_documents(documents)
print(nodes)
sub_nodes = self._gen_propositions(nodes)
print(sub_nodes,"greg")
all_nodes = nodes + sub_nodes
all_nodes_dict = {n.node_id: n for n in all_nodes}

service_context = ServiceContext.from_defaults(
    llm=query_llm ,
    embed_model=embed_model,
    num_output=self._proposition_llm.metadata.num_output,
)
'''
if os.path.exists('./elastic_db'):
    print("From elasticsearch")
    self.vector_index = VectorStoreIndex.from_vector_store(vector_store,service_context=service_context)
else:
    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    self.vector_index = VectorStoreIndex(
         all_nodes, service_context=service_context, show_progress=True,storage_context=storage_context
         )
    os.mkdir("elastic_db")
'''
if os.path.exists('./chroma_db'):
    chroma_client = chromadb.PersistentClient(path="./chroma_db")
    chroma_collection = chroma_client.get_or_create_collection("quickstart")
    vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    self.vector_index = VectorStoreIndex.from_vector_store(vector_store,service_context=service_context)
else:
   chroma_client = chromadb.PersistentClient(path="./chroma_db")
   chroma_collection = chroma_client.get_or_create_collection("quickstart")
   vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
   storage_context = StorageContext.from_defaults(vector_store=vector_store)
   self.vector_index = VectorStoreIndex(
        all_nodes, service_context=service_context, show_progress=True,storage_context=storage_context,store_nodes_override=True
        )
self.retriever = RecursiveRetriever(
    "vector",
    retriever_dict={
        "vector": self.vector_index.as_retriever(
            similarity_top_k=similarity_top_k
        )
    },
    node_dict=all_nodes_dict,
)

self.query_engine = RetrieverQueryEngine.from_args(
    self.retriever, service_context=service_context
)

async def _aget_proposition(self, node: TextNode) -> List[TextNode]:
"""Get proposition."""
inital_output = await self._proposition_llm.apredict(
PROPOSITIONS_PROMPT, node_text=node.text
)
outputs = inital_output.split("\n")

all_propositions = []

for output in outputs:
    if not output.strip():
        continue
    if not output.strip().endswith("]"):
        if not output.strip().endswith('"') and not output.strip().endswith(
            ","
        ):
            output = output + '"'
        output = output + " ]"
    if not output.strip().startswith("["):
        if not output.strip().startswith('"'):
            output = '"' + output
        output = "[ " + output

    try:
        propositions = json.loads(output)
    except Exception:
        # fallback to yaml
        try:
            propositions = yaml.safe_load(output)
        except Exception:
            # fallback to next output
            continue

    if not isinstance(propositions, list):
        continue

    all_propositions.extend(propositions)

assert isinstance(all_propositions, list)
nodes = [TextNode(text=prop) for prop in all_propositions if prop]

return [IndexNode.from_text_node(n, node.node_id) for n in nodes]

def _gen_propositions(self, nodes: List[TextNode]) -> List[TextNode]:
"""Get propositions."""
sub_nodes = asyncio.run(
run_jobs(
[self._aget_proposition(node) for node in nodes],
show_progress=True,
workers=8,
)
)
print(sub_nodes)

# Flatten list
return [node for sub_node in sub_nodes for node in sub_node]

def get_modules(self) -> Dict[str, Any]:
"""Get modules."""
return {
"query_engine": self.query_engine,
"retriever": self.retriever,
}
def run(self, query_str: str, **kwargs: Any) -> RESPONSE_TYPE:
"""Run the pipeline."""
return self.query_engine.query(query_str)`

mwg9r5ms

mwg9r5ms1#

要解决错误 "Query id 40b9d01f-b211-413d-b3d4-a799eff700d6 在 retriever_dict 或 query_engine_dict 中未找到" 并为 dense_x 使用相同的节点第二次,请按照以下步骤操作:

  1. 确保 query_idretriever_dictquery_engine_dict 中:
    确保在初始化 RecursiveRetriever 时,query_id 包含在 retriever_dictquery_engine_dict 中。
  2. 检查初始化:
    验证 retriever_dictquery_engine_dict 是否正确填充了必要的检索器和查询引擎。
    这里是如何初始化 RecursiveRetriever 的示例:
retriever_dict = {
    "vector": self.vector_index.as_retriever(similarity_top_k=similarity_top_k),
    # Add other retrievers as needed
}

query_engine_dict = {
    # Add query engines if needed
}

self.retriever = RecursiveRetriever(
    "vector",
    retriever_dict=retriever_dict,
    node_dict=all_nodes_dict,
)
  1. 第二次使用相同的节点进行 dense_x:
    如果想第二次使用相同的节点进行 dense_x,需要将第一次查询期间检索到的节点存储起来并重用。这里是如何做到这一点的示例:
# First retrieval
query_bundle = QueryBundle(query_str="your query")
retrieved_nodes, additional_nodes = self.retriever.retrieve_all(query_bundle)

# Store the nodes for later use
stored_nodes = retrieved_nodes

# Use the same nodes for dense_x the second time
dense_x_nodes = stored_nodes

通过确保 query_id 出现在字典中并存储节点以便重用,您可以解决此错误并第二次使用相同的节点进行 dense_x
此外,LlamaIndex 框架中的 RecursiveRetriever 类有一个方法 _get_object,该方法根据 query_id 获取检索器或查询引擎。如果 query_idretriever_dictquery_engine_dict 中找不到,它会引发一个 ValueError。请确保您的 query_id 在这些字典中有正确的Map,以避免此错误 1(https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/retrievers/recursive_retriever.py)2(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/retrievers/recursive_retriever.py)。

  • 要继续与 Dosu 交谈,请提及 @dosu。*

相关问题