QAnything [BUG] < title>纯图片型的PDF文件解析会出现数组下标越界的问题

qvsjd97n  于 2个月前  发布在  其他
关注(0)|答案(1)|浏览(48)

是否已有关于该错误的issue或讨论?

  • 是,我已经搜索过已有的issues和讨论。

该问题是否在FAQ中有解答?

  • 是,我已经搜索过FAQ。

当前行为:
上传一个由扫描上来的纯图片型的pdf文件,会一直处于解析中,后端报数组下标越界,具体原因可能是无法解析出文档,不知道是什么原因导致无法解析出文档内容,造成数组下标越界。

期望行为:
能正常解析文档。

运行环境:

- OS: CentOS Linux release 7.9.2009
- NVIDIA Driver:
- CUDA:
- docker:
- docker-compose:
- NVIDIA GPU:
- NVIDIA GPU Memory:

QAnything日志:
2024-07-29 22:44:00,578 insert_files_to_faiss: KB292157495c50455ba10b30c66e9c25d4 100%|█████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1056.23it/s] docs number: 0 2024-07-29 22:44:00,590 before 2nd split doc lens: 0 2024-07-29 22:44:00,590 after 2nd split doc lens: 0 2024-07-29 22:44:00,590 langchain analysis docs is empty! 2024-07-29 22:44:00,591 函数 split_file_to_docs 执行耗时: 0.012501716613769531 秒 2024-07-29 22:44:00,595 split time: 0.012667655944824219 0
ERROR:asyncio:Task exception was never retrieved future: <Task finished name='Task-6' coro=<LocalDocQA.insert_files_to_faiss() done, defined at /opt/soft/QAnything/qanything_kernel/core/local_doc_qa.py:81> exception=IndexError('list index out of range')>
Traceback (most recent call last):
File "/opt/soft/QAnything/qanything_kernel/core/local_doc_qa.py", line 104, in insert_files_to_faiss add_ids = await self.faiss_client.add_document(local_file.docs) File "/opt/soft/QAnything/qanything_kernel/connector/database/faiss/faiss_client.py", line 113, in add_document kb_id = docs[0].metadata['kb_id'] IndexError: list index out of range

a7qyws3x

a7qyws3x1#

+1
有些图片png上传知识库,也会出现这个问题

相关问题