langchain 删除chroma集合不起作用,

mspsb9vt  于 5个月前  发布在  其他
关注(0)|答案(1)|浏览(60)

检查其他资源

  • 我为这个问题添加了一个非常描述性的标题。
  • 我使用集成搜索在LangChain文档中进行了搜索。
  • 我使用GitHub搜索找到了一个类似的问题,但没有找到。
  • 我确信这是LangChain中的一个错误,而不是我的代码。
  • 通过更新到LangChain的最新稳定版本(或特定集成包)无法解决此错误。

示例代码

class VectorStoreCreator:
    """
    A class to create a vector store from documents.

    Methods
    -------
    create_vectorstore(documents, embed_model, filepath):
        Creates a vector store from a set of documents using the provided embedding model.
    """
    
    @staticmethod
    def create_vectorstore(documents, embed_model, collection_name):
        """
        Creates a vector store from a set of documents using the provided embedding model.

        This function utilizes the Chroma library to create a vector store, which is a 
        data structure that facilitates efficient similarity searches over the document 
        embeddings. Optionally, a persistent directory and collection name can be specified 
        for storing the vector store on disk.

        Parameters
        ----------
        documents : list
            A list of documents to be embedded and stored.
        embed_model : object
            The embedding model used to convert documents into embeddings.
        filepath : str
            The file path for persisting the vector store.

        Returns
        -------
        object
            A Chroma vector store instance containing the document embeddings.
        """
        try:
            # Create the vector store using Chroma
            vectorstore = Chroma.from_texts(
                texts=documents,
                embedding=embed_model,
                # persist_directory=f"chroma_db_{filepath}",
                collection_name=f"{collection_name}"
            )
            logger.info("Vector store created successfully.")
            return vectorstore
        except Exception as e:
            logger.error(f"An error occurred during vector store creation: {str(e)}")
            return None
    
    @staticmethod
    def create_collection(file_name):
        """
        Create a sanitized collection name from the given file name.
        
        This method removes non-alphanumeric characters from the file name and truncates it to a maximum of 36 characters to form the collection name.
        
        Args:
            file_name (str): The name of the file from which to create the collection name.

        Returns:
            str: The sanitized and truncated collection name.

        Raises:
            Exception: If an error occurs during the collection name creation process, it logs the error.
        """
        try:
            collection_name = re.compile(r'[^a-zA-Z0-9]').sub('', file_name)[:36]
            logger.info(f"A collection name created for the filename: {file_name} as {collection_name}")
            return collection_name
        except Exception as e:
            logger.error(f"An errro occured during the collection name creation : {str(e)}")
        
    @staticmethod
    def delete_vectorstore(collection_name):
        """
        Delete the specified vector store collection.
        
        This method deletes a collection in the vector store identified by the collection name.
        
        Args:
            collection_name (str): The name of the collection to delete.

        Returns:
            None: This method does not return a value.

        Raises:
            Exception: If an error occurs during the deletion process, it logs the error.
        """
        try:
            Chroma.delete_collection()
            return None
        except Exception as e:
            logger.error(f"An error occured during vector store deletion:{str(e)}")
            return None
        ```

### Error Message and Stack Trace (if applicable)

_No response_

### Description

I am trying to delete the collection while using the chroma. But actually it's not working. Could anyone help me to fix this issues.

class VectorStoreCreator:
"""
从文档创建向量存储的类。

Methods
-------
create_vectorstore(documents, embed_model, filepath):
    Creates a vector store from a set of documents using the provided embedding model.
"""

@staticmethod
def create_vectorstore(documents, embed_model, collection_name):
    """
    Creates a vector store from a set of documents using the provided embedding model.

    This function utilizes the Chroma library to create a vector store, which is a 
    data structure that facilitates efficient similarity searches over the document 
    embeddings. Optionally, a persistent directory and collection name can be specified 
    for storing the vector store on disk.

    Parameters
    ----------
    documents : list
        A list of documents to be embedded and stored.
    embed_model : object
        The embedding model used to convert documents into embeddings.
    filepath : str
        The file path for persisting the vector store.

    Returns
    -------
    object
        A Chroma vector store instance containing the document embeddings.
    """
    try:
        # Create the vector store using Chroma
        vectorstore = Chroma.from_texts(
            texts=documents,
            embedding=embed_model,
            # persist_directory=f"chroma_db_{filepath}",
            collection_name=f"{collection_name}"
        )
        logger.info("Vector store created successfully.")
        return vectorstore
    except Exception as e:
        logger.error(f"An error occurred during vector store creation: {str(e)}")
        return None

@staticmethod
def create_collection(file_name):
    """
    Create a sanitized collection name from the given file name.
    
    This method removes non-alphanumeric characters from the file name and truncates it to a maximum of 36 characters to form the collection name.
    
    Args:
        file_name (str): The name of the file from which to create the collection name.

    Returns:
        str: The sanitized and truncated collection name.

    Raises:
        Exception: If an error occurs during the collection name creation process, it logs the error.
    """
    try:
        collection_name = re.compile(r'[^a-zA-Z0-9]').sub('', file_name)[:36]
        logger.info(f"A collection name created for the filename: {file_name} as {collection_name}")
        return collection_name
    except Exception as e:
        logger.error(f"An errro occured during the collection name creation : {str(e)}")
    
@staticmethod
def delete_vectorstore(collection_name):
    """
    Delete the specified vector store collection.
    
    This method deletes a collection in the vector store identified by the collection name.
    
    Args:
        collection_name (str): The name of the collection to delete.

    Returns:
        None: This method does not return a value.

    Raises:
        Exception: If an error occurs during the deletion process, it logs the error.
    """
    try:
        Chroma.delete_collection()
        return None
    except Exception as e:
        logger.error(f"An error occured during vector store deletion:{str(e)}")
        return None
### System Info

langchain==0.1.10
7jmck4yq

7jmck4yq1#

你好@rabin3030,
我在生产环境中遇到了同样的问题,这对我们的公司来说非常严重!
我们添加了许多向量/文档集合并经常更新它们。
问题在于,如果你仔细观察SQLite3数据库,所有已删除的信息(包括已删除的链接(外键))都会不断累积,但数据库会变得越来越大。
在很短的时间内,我们在ChromaDB数据库文件夹中达到了超过13 GB,服务器内存正在爆炸!
通过测试许多解决方案,我找到了一个奇怪且暂时的解决方案...
这里是:

ids = chromaColl.get()['ids']
            if ids :
                chromaColl.delete(ids)
            del chromaColl
            _chromadb.delete_collection(collectionName)

为什么必须按顺序调用这两个删除操作才能正确清空数据?

相关问题