ollama 私人GPT示例对我来说是坏的,

g9icjywg  于 2个月前  发布在  其他
关注(0)|答案(1)|浏览(27)

按照您提供的说明安装后,在包含19个PDF文档的文件夹上运行ingest.py时,它崩溃并显示以下堆栈跟踪:

Creating new vectorstore
Loading documents from source_documents
Loading new documents: 100%|████████████████████| 19/19 [00:02<00:00,  7.12it/s]
Loaded 1695 new documents from source_documents
Split into 8065 chunks of text (max. 500 tokens each)
Creating embeddings. May take some minutes...
Traceback (most recent call last):
  File "c:\PROGRAMS\PRIVATEGPT\ingest.py", line 161, in <module>
    main()
  File "c:\PROGRAMS\PRIVATEGPT\ingest.py", line 153, in main
    db = Chroma.from_documents(texts, embeddings, persist_directory=persist_directory)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\PROGRAMS\PRIVATEGPT\venv\Lib\site-packages\langchain\vectorstores\chroma.py", line 612, in from_documents
    return cls.from_texts(
           ^^^^^^^^^^^^^^^
  File "c:\PROGRAMS\PRIVATEGPT\venv\Lib\site-packages\langchain\vectorstores\chroma.py", line 576, in from_texts
    chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)
  File "c:\PROGRAMS\PRIVATEGPT\venv\Lib\site-packages\langchain\vectorstores\chroma.py", line 222, in add_texts
    raise e
  File "c:\PROGRAMS\PRIVATEGPT\venv\Lib\site-packages\langchain\vectorstores\chroma.py", line 208, in add_texts
    self._collection.upsert(
  File "c:\PROGRAMS\PRIVATEGPT\venv\Lib\site-packages\chromadb\api\models\Collection.py", line 298, in upsert
    self._client._upsert(
  File "c:\PROGRAMS\PRIVATEGPT\venv\Lib\site-packages\chromadb\api\segment.py", line 290, in _upsert
    self._producer.submit_embeddings(coll["topic"], records_to_submit)
  File "c:\PROGRAMS\PRIVATEGPT\venv\Lib\site-packages\chromadb\db\mixins\embeddings_queue.py", line 127, in submit_embeddings
    raise ValueError(
ValueError:
                Cannot submit more than 5,461 embeddings at once.
                Please submit your embeddings in batches of size
                5,461 or less.

我不知道它从哪里得到了“1695个新文档”的想法,因为该文件夹只包含19个PDF文件(如加载行所示)。

6uxekuva

6uxekuva1#

已编辑:2024年5月18日
较早的配方在Ollama v0.1.38和privateGPT仍然无法使用。
由旧版本chromadb引起的问题已在v0.1.38中修复。对于组件langchain,似乎需要将其替换为langchain-community
下面的配方(在WSL2上的VMware Photon OS上)将组件更新到最新版本。

#!/bin/sh

sudo tdnf install -y python3-pip python3-devel git

cd $HOME
# Delete an earlier installation if necessary
# sudo rm -r -f ollama
# sudo rm -r -f .ollama

# install bits and source
export RELEASE=0.1.38
curl -fsSL https://ollama.com/install.sh | sed "s#https://ollama.com/download#https://github.com/ollama/ollama/releases/download/v\$RELEASE#" | sh
# Get the Ollama source examples
git clone -b v$RELEASE https://github.com/ollama/ollama.git
cd ollama/examples/langchain-python-rag-privategpt

# python environment
sudo python3 -m venv .venv
source .venv/bin/activate
sudo pip3 install --upgrade pip
# ADJUST PATH VARIABLE AS DESCRIBED IN OUTPUT OF pip3 install --upgrade pip
# export PATH=$PATH:<yourpath>
sudo pip3 install -r requirements.txt

# In Ollama 0.1.38 , there is still an issue.
# Updating components usually helps.
pip --disable-pip-version-check list --outdated --format=json | python -c "import json, sys; print('\n'.join([x['name'] for x in json.load(sys.stdin)]))" | sudo xargs -n1 pip install -U

# Create the `source_documents` directory, store all your PDF documents in it 
mkdir -p $HOME/ollama/examples/langchain-python-rag-privategpt/source_documents

# copy all your documents to $HOME/ollama/examples/langchain-python-rag-privategpt/source_documents
# INSERT HERE

# Start ingest
python ./ingest.py

# Start privateGPT
python ./privateGPT.py

+1这仍然是一个问题,参见output
在Ollama版本0.1.32上进行了测试。根本原因是chroma子组件中的一个概念驱动限制,但已经修复。
编辑:实际上,有几个子组件版本导致的故障。到目前为止,最好的建议是逐步检查安装过程。
pip3 uninstall -r requirements.txt -y
pip3 install tqdm
pip3 install langsmith
pip3 install huggingface-hub
pip3 install langchain
pip3 install gpt4all
pip3 install chromadb
pip3 install llama-cpp-python
pip3 install urllib3
pip3 install PyMuPDF
pip3 install unstructured
pip3 install extract-msg
pip3 install tabulate
pip3 install pandoc
pip3 install pypandoc
pip3 install sentence_transformers
我使用的是chroma 0.4.7 --> 在pyproject.toml中设置chromadb = "^0.4.7"
在Ollama中,有一个包管理问题,但可以通过以下解决方法解决。
pip3 uninstall langsmith
pip3 uninstall langchain-core
pip3 uninstall langchain
pip3 install langsmith
pip3 install langchain-core
pip3 install langchain
之后,python ingest.py成功完成。

编辑:使用解决方法的缺点是在每个python ingest.py上重新创建新的向量存储库。
@jmorganca 请查看此问题,很高兴看到Ollama使用经过预测试的较新版本的langchain-community、chroma等。这个问题已经在#533中报告。同时,在chroma-core/chroma#1049中,通过将max_batch_size声明为公共API来修复chroma问题。这仍然是一个限制,但随着更改,它现在是特定于客户端的限制。
编辑:奇怪,我刚刚意识到,根据#949,这个问题早就已经修复了。

相关问题