python 从PDF中删除重复页面

5kgi1eie  于 2023-01-24  发布在  Python
关注(0)|答案(2)|浏览(319)

我有一个PDF文件,其中有很多重复的页面,我想删除。这是我的代码:

pdf_reader = PyPDF2.PdfFileReader(filename_path)
print(pdf_reader.getNumPages())
pdf_writer = PyPDF2.PdfFileWriter()
last_page_n = pdf_reader.getNumPages() - 1

megalist1 =[]
for i in range(last_page_n):
    current_page = pdf_reader.getPage(i)
    megalist1.append(current_page)

res = []
[res.append(x) for x in megalist1 if x not in res]
print(len(megalist1))

它不会产生任何错误,但它也不工作,我做错了什么?

qvtsj1bj

qvtsj1bj1#

这不是列表解析的工作方式,但是您可以在添加到原始列表时执行重复检查,即:

megalist1 =[]
for i in range(last_page_n):
    current_page = pdf_reader.getPage(i)
    if current_page not in megalist:
        megalist1.append(current_page)
rsaldnfx

rsaldnfx2#

以下是修复代码的一种方法:

pdf_reader = PyPDF2.PdfFileReader(filename_path)
pdf_writer = PyPDF2.PdfFileWriter()

# Create an empty list to store unique pages
unique_pages = []

# Iterate through each page in the PDF
for i in range(pdf_reader.getNumPages()):
    current_page = pdf_reader.getPage(i)
    # Check if the current page is already in the unique_pages list
    if current_page not in unique_pages:
        # If not, add it to the list
        unique_pages.append(current_page)
        # And also add it to the output PDF
        pdf_writer.addPage(current_page)

# Write the output PDF to a new file
with open("output.pdf", "wb") as out:
    pdf_writer.write(out)

相关问题