python-3.x 如何解决PDF byteString PyPDF2中的错误

8ulbf1ek 于 2023-08-08 发布在 Python

关注(0)|答案(1)|浏览(125)

我创建了一个函数来转换一个带有byteString列表中许多页面的PDF，但是对于一个带有1页的特定PDF有一个奇怪的错误，函数返回一个列表和这个错误：
对象14 0内的流（索引0）无效：流意外结束
但是当我尝试将输出转换为PDF再次使用此列表时，PDF创建得很好，可以打开和可视化。但是当我试图用谷歌的de DOCUMENT AI处理这个列表时，它没有返回任何东西。对于其他PDF，此错误不会发生，只有在特定的这一个。
这些功能是：

from PyPDF2 import PdfReader, PdfWriter

def pdf_to_list(byte_string: bytes) -> List[bytes]:
    pdf_pages = []
    with io.BytesIO(byte_string) as stream:
        pdf = PdfReader(stream, strict = False)
        num_pages = len(pdf.pages)
        for page_number in range(num_pages):
            pdf_writer = PdfWriter()
            pdf_writer.add_page(pdf.pages[page_number])
            output_stream = io.BytesIO()
            pdf_writer.write(output_stream)
            output_stream.seek(0)
            pdf_pages.append(output_stream.read())
    return pdf_pages

def save_bytestring_as_pdf(bytestring: bytes, file_path: str) -> None:
    with open(file_path, 'wb') as file:
        file.write(bytestring)
    print(f'Bytestring saved as PDF: {file_path}')

字符串
有人能帮我看看发生了什么事吗？我有问题的pdf，但我不知道如何上传PDF在这里，但如果你想我可以发送它测试。

python-3.x

来源：https://stackoverflow.com/questions/76639218/how-solve-an-error-in-pdf-bytestring-pypdf2

1条答案

按热度按时间

cygmwpex1#

不确定这与您的问题或用例有多相关，需要更多关于您正在尝试完成的内容的信息和上下文。
在Document AI中，要以字节形式发送PDF文件，不需要使用PyPDF这样的库来转换成字节字符串。您可以按照以下文档中的示例进行操作，并将其作为二进制文件打开以发送字节，如下所示。
https://cloud.google.com/document-ai/docs/process-documents-client-libraries#client-libraries-usage-python
（完整代码的片段）

# Read the file into memory
    with open(file_path, "rb") as image:
        image_content = image.read()

    # Load binary data
    raw_document = documentai.RawDocument(
        content=image_content,
        mime_type="application/pdf",  # Refer to https://cloud.google.com/document-ai/docs/file-types for supported file types
    )

    # Configure the process request
    # `processor.name` is the full resource name of the processor, e.g.:
    # `projects/{project_id}/locations/{location}/processors/{processor_id}`
    request = documentai.ProcessRequest(name=processor.name, raw_document=raw_document)

    result = client.process_document(request=request)

字符串

赞(0）回复(0）举报 2023-08-08

我来回答

python-3.x 如何解决PDF byteString PyPDF2中的错误

1条答案

相关问题

热门标签

最新问答