langchain 支持为Milvus集合创建标量字段索引,

5fjcxozz 于 6个月前发布在其他

关注(0)|答案(5)|浏览(74)

检查其他资源

为这个问题添加了一个非常描述性的标题。
使用集成搜索在LangChain文档中进行搜索。
使用GitHub搜索查找类似的问题，但没有找到。
我确信这是LangChain中的一个bug,而不是我的代码。
通过更新到LangChain的最新稳定版本(或特定集成包)无法解决此bug。

示例代码

def _create_index(self) -> None:
    """Create a index on the collection"""
    from pymilvus import Collection, MilvusException

    if isinstance(self.col, Collection) and self._get_index() is None:
        try:
            # If no index params, use a default HNSW based one
            if self.index_params is None:
                self.index_params = {
                    "metric_type": "L2",
                    "index_type": "HNSW",
                    "params": {"M": 8, "efConstruction": 64},
                }

            try:
                self.col.create_index(
                    self._vector_field,
                    index_params=self.index_params,
                    using=self.alias,
                )

            # If default did not work, most likely on Zilliz Cloud
            except MilvusException:
                # Use AUTOINDEX based index
                self.index_params = {
                    "metric_type": "L2",
                    "index_type": "AUTOINDEX",
                    "params": {},
                }
                self.col.create_index(
                    self._vector_field,
                    index_params=self.index_params,
                    using=self.alias,
                )
            logger.debug(
                "Successfully created an index on collection: %s",
                self.collection_name,
            )

        except MilvusException as e:
            logger.error(
                "Failed to create an index on collection: %s", self.collection_name
            )
            raise e

错误信息和堆栈跟踪(如果适用)

无响应*

描述

我们正在尝试使用Langchain_milvus库通过元数据创建milvus集合。现在milvus的最新版本支持对其他列的其他Scalar Index以提高过滤数据的性能。
目前，langchain milvus仅支持为VECTOR字段添加索引。
我们可以使用metadata_schema逻辑来支持对Scalar字段的索引。
#23219

系统信息

langchain-core==0.2.20
langchain-community==0.2.7

langchain

来源：https://github.com/langchain-ai/langchain/issues/24343

5条答案

按热度按时间

dxxyhpgq1#

Hi, @rgupta2508 did you considered creating the scalar indexes using the collection instance?

collection.create_index(field_name=field_name, index_name=f"{field_name}_index")

赞(0）回复(0）举报 6个月前

dpiehjr42#

你好，@rgupta2508,你考虑过使用集合示例创建标量索引吗？

collection.create_index(field_name=field_name, index_name=f"{field_name}_index")

@RafaelXokito 是的，我们可以使用集合示例像这样创建索引，但想法是在开始嵌入之前使用langchain创建集合。目前，如果集合不存在，将创建集合。
起初我们不知道集合是否存在，如果集合存在，则不需要在集合中进行索引，但是如果在嵌入过程中创建了新集合，我们立即希望添加索引，因为在嵌入完成后，在该列上添加索引需要一些时间。
但是是的，使用pymilvus,我们可以完成所有操作。

赞(0）回复(0）举报 6个月前

n3h0vuf23#

我认为我没有理解您的担忧，但这是我建议创建您想要的所有标量索引的方法：

pymilvus.connections.connect(uri=uri)
_is_new_collection = not pymilvus.has_collection(collection_name)
if _is_new_collection:
    collection = Collection(collection_name, _source_schema.schema)

    index_params = {
        "metric_type": "L2",
        "index_type": "HNSW",
        "params": {"M": 8, "efConstruction": 64},
    }
    collection.create_index(_source_schema.vector.name, index_params=index_params)

    existing_index_names = {index.field_name for index in collection.indexes}
    for field_name in ["scalar_1", "scalar_2"]:  #Specify which scalar field names you want
        if field_name not in existing_index_names:
            collection.create_index(field_name=field_name, index_name=f"{field_name}_index")
    collection.load()

_vector_store = Milvus(embedding_function=self._embedding_function,
                                               collection_name=collection_name)

请考虑milvus会保留一个连接池，该连接池可以通过“别名”进行重用。这使得这种方法非常可靠。

赞(0）回复(0）举报 6个月前

eblbsuwk4#

@rgupta2508 我不认为我理解了你的担忧，但这是我建议创建所有标量索引的方法：

pymilvus.connections.connect(uri=uri)
_is_new_collection = not pymilvus.has_collection(collection_name)
if _is_new_collection:
    collection = Collection(collection_name, _source_schema.schema)

    index_params = {
        "metric_type": "L2",
        "index_type": "HNSW",
        "params": {"M": 8, "efConstruction": 64},
    }
    collection.create_index(_source_schema.vector.name, index_params=index_params)

    existing_index_names = {index.field_name for index in collection.indexes}
    for field_name in ["scalar_1", "scalar_2"]:  #Specify which scalar field names you want
        if field_name not in existing_index_names:
            collection.create_index(field_name=field_name, index_name=f"{field_name}_index")
    collection.load()

_vector_store = Milvus(embedding_function=self._embedding_function,
                                               collection_name=collection_name)

请考虑到milvus会保留一个连接池，该连接池可以通过"别名"重用。这使得这种方法非常可靠。
好的，现在我明白了，你是建议使用pymilvus创建集合，并使用langchain milvus对象进行嵌入。但是在调用嵌入API后，我们可以获得向量，而向量的维度是从langchain中用于创建集合的向量中获取的。你能建议一下吗？我不确定我们应该如何处理这个问题。但是我们可以在创建集合后尝试创建标量索引。

赞(0）回复(0）举报 6个月前

hec6srdp5#

感谢您的反馈，请告诉我如果您在任何情况下需要帮助。

赞(0）回复(0）举报 6个月前

我来回答

langchain 支持为Milvus集合创建标量字段索引,

检查其他资源

示例代码

错误信息和堆栈跟踪(如果适用)

描述

系统信息

5条答案

相关问题

热门标签

最新问答