无法使用扫描仪扫描PDF文档中的特定单词

iyr7buue 于 2022-10-22 发布在 Java

关注(0)|答案(1)|浏览(148)

我试图找出特定字符串在外部PDF文件中重复的频率。我无法使用扫描仪扫描pdf文档，并弹出一个我不理解的错误。这是我目前拥有的代码：

package Files;

import org.apache.pdfbox.Loader;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.PDFTextStripperByArea;

import java.io.File;
import java.io.IOException;
import java.util.Scanner;

public class StringSearcherTest {
    public static void main(String[] args) throws IOException {
        Scanner searcher = new Scanner(System.in);
        Boolean foundString = false;

        System.out.println("Enter the word you would like to search for");
        String word = searcher.nextLine(); 
        word = word.trim();
        int count = 0;
        String phrase = ("The text that will be read is the following:");
        System.out.println(phrase.toUpperCase() + "\n");

        try (PDDocument document = Loader.loadPDF(new File("/Users/ricardobarahona/Desktop/LifeStyle Steps.pdf"))) {
            document.getClass();

            if (!document.isEncrypted()) {
                PDFTextStripperByArea stripper = new PDFTextStripperByArea();
                stripper.setSortByPosition(true);

                PDFTextStripper tStripper = new PDFTextStripper();
                String pdfFileInText = tStripper.getText(document);
                System.out.println("Text:" + pdfFileInText);

            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        Scanner sc2 = new Scanner((Readable) Loader.loadPDF(new File("/Users/ricardobarahona/Desktop/LifeStyle Steps.pdf")));

    }
}

IDE不会反映代码中的任何错误，但是当我运行程序时，PDF会打印出来，第二台扫描仪扫描文档的时间到了，这个错误会弹出：
线程“main”java.lang.ClassCastException:类org.apache.pdfbox.pdmodel.PDDocument无法强制转换为Files.StringSearcherTest.main（StringSeaarcherTest.java:40）处的类java.lang.Readable

Java

来源：https://stackoverflow.com/questions/74126866/unable-to-use-a-scanner-to-scan-over-a-pdf-document-for-a-specific-word