如何使用python突出显示pdf上的文本?

bf1o4zei  于 2021-08-25  发布在  Java
关注(0)|答案(0)|浏览(353)

我正在尝试制作一个python脚本,允许用户输入pdf,然后用户将输入要搜索的单词,如果找到这些单词,则突出显示并导出为唯一的文件名。如果找不到单词,我会运行代码,但由于某种原因,当找到单词时,此代码会中断。欢迎提供任何帮助或建议!

  1. ### IMPORT PACKAGES NEEDED
  2. import sys
  3. from inspect import cleandoc
  4. # !pip install PyMuPDF==1.16.14
  5. import fitz
  6. import time
  7. import PySimpleGUI as sg
  8. import sys
  9. searchingWords = []
  10. ### READ IN PDF
  11. sg.theme('BlueMono')
  12. inputfname = sg.popup_get_file('PDF Browser', 'PDF file to open', file_types=(("PDF Files", "*.pdf"),))
  13. if inputfname is None:
  14. sg.popup_cancel('Cancelled.')
  15. exit(0)
  16. print(inputfname)
  17. doc = fitz.open(inputfname)
  18. ### USER INPUTTING WORDS
  19. # Window definition
  20. layout = [[sg.Text("What word or phrase do you want to search for?")],
  21. [sg.Input(key='-INPUT-', do_not_clear=False)],
  22. [sg.Text(size=(40,1), key='-OUTPUT-')],
  23. [sg.Button('Next word', ), sg.Button('Confirm'), sg.Button('Next word enter', visible=False, bind_return_key=True)]]
  24. # Create the window
  25. window = sg.Window('Word Search', layout)
  26. # Display window
  27. while True:
  28. event, values = window.read()
  29. # See if user wants to quit or window was closed
  30. if event == sg.WINDOW_CLOSED or event == 'Confirm':
  31. break
  32. # Output a message to the window
  33. searchingWords.append(values['-INPUT-'])
  34. window['-OUTPUT-'].update(str(searchingWords))
  35. # Remove window
  36. window.close()
  37. ### END USER INPUT FOR SEARCH WORDS
  38. for page in doc:
  39. ### SEARCHING FOR THE WORDS
  40. for word in searchingWords:
  41. # ??? How to change this to ensure there is a non-alphabetic letter next to it?
  42. text = str(word)
  43. text_instances = page.searchFor(text)
  44. ### HIGHLIGHTING THE WORDS
  45. for inst in text_instances:
  46. highlight = page.addHighlightAnnot(inst)
  47. highlight.update()
  48. ### SET FILE OUTPUT NAME
  49. datetimefilename = time.strftime("%m-%d-%Y-%H.%M.%S") + "Highlighted.pdf"
  50. ### OUTPUT
  51. doc.save(str(datetimefilename), garbage=4, deflate=True, clean=True)

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题