llama_index 如何重用在denseX中创建的elastic search中的嵌入,

h43kikqp  于 23天前  发布在  其他
关注(0)|答案(1)|浏览(17)

Bug描述

如何存储、重用由Elastic Search和denseX创建的存储嵌入。
以下是我遇到问题的其中一种方法,如果将它们存储在字典中并尝试发送它。

同时请提供额外的信息以获得根据生成的响应获得正确的页码引用

版本

llama-index==0.10.12

重现步骤

def create_all_nodes_dict():

documents = SimpleDirectoryReader(input_dir=os.environ.get('DOC_PATH')).load_data()
demotest.info("Loading Documents")
node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=100)
demotest.info("Splitting the Sentences")
base_nodes = node_parser.get_nodes_from_documents(documents)
for idx, node in enumerate(base_nodes):
    node.id_ = f"node-{idx}"
demotest.info("Creating Embedding")

sub_chunk_sizes = [128, 256, 512]
sub_node_parsers = [
    SentenceSplitter(chunk_size=c, chunk_overlap=100) for c in sub_chunk_sizes
]

all_nodes = []
for base_node in base_nodes:
    for n in sub_node_parsers:
        sub_nodes = n.get_nodes_from_documents([base_node])
        sub_inodes = [
            IndexNode.from_text_node(sn, base_node.node_id) for sn in sub_nodes
        ]
        all_nodes.extend(sub_inodes)

    # also add original node to node
    original_node = IndexNode.from_text_node(base_node, base_node.node_id)
    all_nodes.append(original_node)
all_nodes_dict = {n.node_id: n for n in all_nodes}

file_path = './pkl_folder/all_nodes_dict.pkl'
os.makedirs(os.path.dirname(file_path), exist_ok=True)
# Open the file in binary write mode ('wb') to save the dictionary
with open(file_path, 'wb') as f:
    pickle.dump(all_nodes_dict, f)

return all_nodes_dict

def define_rag(all_nodes):
llm = llm_query()
embed_model = embeded_model()

# update_status_check('0')
vector_store = ElasticsearchStore(
index_name=os.environ.get('INDEX_NAME'),
es_url=os.environ.get('ES_URL'),
# retrieval_strategy=AsyncDenseVectorStrategy(hybrid=True),
)

retriever_chunk = DenseXRetrievalPack(
    documents=all_nodes,
    query_llm=llm,
    vector_store=vector_store
)

query_engine_chunk = retriever_chunk.query_engine
return query_engine_chunk,retriever_chunk

all_nodes = create_all_nodes_dict()
query_engine_chunk,retriever_chunk = define_rag(all_nodes)
response = query_engine_chunk.query("Current Query: "+ query)

相关日志/回溯

please refer above attached image for relevant logs.
yfjy0ee7

yfjy0ee71#

要存储和重用Elasticsearch和denseX创建的嵌入,您可以使用llama_index包中的ElasticsearchEmbedding类。这个类允许您使用部署在Elasticsearch集群中的模型生成嵌入。您可以使用现有的Elasticsearch连接示例化这个类,或者通过提供Elasticsearch凭据来示例化它。
以下是如何使用现有的Elasticsearch连接示例化ElasticsearchEmbedding类的示例:

from elasticsearch import Elasticsearch
from llama_index.legacy.embeddings.elasticsearch import ElasticsearchEmbedding

# Define the model ID and input field name (if different from default)
model_id = "your_model_id"
input_field = "your_input_field"  # Optional, only if different from 'text_field'

# Create Elasticsearch connection
es_connection = Elasticsearch(hosts=["localhost:9200"], basic_auth=("user", "password"))

# Instantiate ElasticsearchEmbedding using the existing connection
embeddings = ElasticsearchEmbedding.from_es_connection(
    model_id,
    es_connection,
    input_field=input_field,
)

要解决llama_index包中的AttributeError: 'str' object has no attribute 'id',请确保将正确类型的对象传递给引发错误的方法。错误表明,应该使用具有id属性的对象,但实际上使用了字符串。请仔细检查您的方法输入并确保它们与预期的类型匹配。
例如,如果您遇到类似于AttributeError: 'str' object has no attribute 'upsert'的错误,这意味着传递了一个向量存储对象,但期望的是字符串。请确保正确初始化向量存储并将其传递给相关方法。
以下是一个基于常见问题的修正示例:

from llama_index import SimpleDirectoryReader, VectorStoreIndex, StorageContext
from llama_index.vector_stores import PineconeVectorStore

# Load documents
loader = SimpleDirectoryReader('/path to data', recursive=True)
documents = loader.load_data()

# Initialize Pinecone vector store
pinecone_index = "your_pinecone_index"  # Ensure this is a valid Pinecone index object, not a string
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)

# Create storage context
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# Create index
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

如果您需要更具体的指导来解决AttributeError,请提供发生错误的代码片段。ElasticsearchEmbedding类期望一个Elasticsearch连接对象和一个模型ID,因此请确保这些都正确提供。
此外,如果您在使用可组合索引并遇到类似于AttributeError: 'tuple' object has no attribute 'embedding'的错误,请确保正确处理您的方法返回的对象。例如,在创建ComposableGraph并查询它时,请确保传递周围的对象是预期的类型和结构。
以下是如何创建和查询ComposableGraph的示例:

from llama_index import ComposableGraph, ListIndex

# Assuming index1 and index2 are already created and are instances of GPTSimpleVectorIndex
graph = ComposableGraph.from_indices(
    ListIndex,
    [index1, index2],
    index_summaries=[
        "summary 1", 
        "summary 2"
    ],
)

# Query the graph
response_summary = graph.query(graph_query_str, query_configs=query_configs)

请确保index1index2是有效的索引对象,而不是元组或其他没有所需属性(如1(https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/embeddings/llama-index-embeddings-elasticsearch/llama_index/embeddings/elasticsearch/base.py)2(https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/embeddings/elasticsearch.py)3(https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/discover_llamaindex/document_management/discord_dumps/help_channel_dump_06_02_23.json)4(https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/discover_llamaindex/document_management/discord_dumps/help_channel_dump_05_25_23.json))的其他类型。

  • 要继续与Dosu交谈,请提及@dosu。*

相关问题