llama_index [Bug]:智能PDF加载器-无法建立新的连接

62o28rlo 于 2个月前发布在其他

关注(0)|答案(3)|浏览(30)

Bug描述

我正在运行网站上的示例：

from llama_index.readers.smart_pdf_loader import SmartPDFLoader

llmsherpa_api_url = "https://readers.llmsherpa.com/api/document/developer/parseDocument?renderFormat=all"
pdf_url = "https://arxiv.org/pdf/1910.13461.pdf"  # also allowed is a file path e.g. /home/downloads/xyz.pdf
pdf_loader = SmartPDFLoader(llmsherpa_api_url=llmsherpa_api_url)
documents = pdf_loader.load_data(pdf_url)

我遇到了以下错误：
An exception occurred: HTTPSConnectionPool(host='readers.llmsherpa.com', port=443): Max retries exceeded with url: /api/document/developer/parseDocument?renderFormat=all (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002671A13C910>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond')) urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x000002671A13C910>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 上述异常是以下异常的直接原因： raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='readers.llmsherpa.com', port=443): Max retries exceeded with url: /api/document/developer/parseDocument?renderFormat=all (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002671A13C910>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))

版本

llama-index 0.10.54
llama-index-readers-smart-pdf-loader 0.1.4

重现步骤

from llama_index.readers.smart_pdf_loader import SmartPDFLoader

llmsherpa_api_url = "https://readers.llmsherpa.com/api/document/developer/parseDocument?renderFormat=all"
pdf_url = "https://arxiv.org/pdf/1910.13461.pdf"  # also allowed is a file path e.g. /home/downloads/xyz.pdf
pdf_loader = SmartPDFLoader(llmsherpa_api_url=llmsherpa_api_url)
documents = pdf_loader.load_data(pdf_url)

3条答案

按热度按时间

6ju8rftf1#

你遇到的连接错误可能是由于llmsherpa API服务本身的问题或网络相关问题。以下是一些你可以采取的步骤来诊断和解决问题：

检查API服务状态:确保llmsherpa API服务正在运行。有时，该服务可能会暂时关闭进行维护或其他原因。
网络配置:验证是否存在网络问题或防火墙设置阻止了对https://readers.llmsherpa.com的连接。你可以尝试在网页浏览器中直接访问URL,看看它是否可达。
速率限制:请注意，llmsherpa API已实施速率限制。如果你每秒超过允许的请求次数，你的IP可能会被临时禁止。确保你在短时间内没有发出过多的请求。
代理设置:如果你位于代理后面，请确保你的代理设置在环境中正确配置。你可能需要在代码中设置代理设置。
重试机制:在你的代码中实现一个重试机制来处理 transient 网络问题。这可以帮助处理连接偶尔失败的情况。

以下是如何实现一个简单的重试机制的示例：

import time

def connect():
    # 这里是连接代码
    pass

for i in range(3):
    try:
        connect()
        break
    except Exception as e:
        print(f"连接失败，正在重试({i + 1}/{3})...")
        time.sleep(5)
else:
    print("连接失败，已达到最大重试次数。")

这段代码将尝试连接最多三次，每次尝试之间间隔5秒。如果问题仍然存在，可能值得联系llmsherpa API的维护者，以检查是否有已知问题或需要的其他配置。

赞(0）回复(0）举报 2个月前

fslejnso2#

你好，@RGalkin!我在这里，以帮助你解决这个问题🤖
你可能遇到了这个连接错误，因为llmsherpa API URL可能是错误的，或者服务可能已经关闭。请确保llmsherpa_api_url是正确的，并且该服务可以访问。
这是文档中提供的示例用法：

from llama_index.readers.smart_pdf_loader import SmartPDFLoader

llmsherpa_api_url = "https://readers.llmsherpa.com/api/document/developer/parseDocument?renderFormat=all"
pdf_url = "https://arxiv.org/pdf/1910.13461.pdf"  # also allowed is a file path e.g. /home/downloads/xyz.pdf
pdf_loader = SmartPDFLoader(llmsherpa_api_url=llmsherpa_api_url)
documents = pdf_loader.load_data(pdf_url)

请确保llmsherpa_api_url是正确的，你可以从你的网络访问它。如果URL或服务无法访问，你将遇到连接错误。
此外，请确保正确指定了pdf_path_or_url参数，可以是URL或文件路径。SmartPDFLoader类使用来自llmsherpa.readers的LayoutPDFReader来读取和分块PDF,因此URL或文件路径的问题也可能导致连接错误。

要继续与我交谈，请提及@dosu。*

赞(0）回复(0）举报 2个月前

2exbekwf3#

我认为可能是我的代码中的某些内容干扰了连接，所以我在colab中运行了代码，仍然得到了相同的错误。

赞(0）回复(0）举报 2个月前

我来回答

llama_index [Bug]:智能PDF加载器-无法建立新的连接

Bug描述

版本

重现步骤

相关日志/回溯

3条答案

相关问题

热门标签

最新问答