无法使用扫描仪扫描PDF文档中的特定单词

iyr7buue  于 2022-10-22  发布在  Java
关注(0)|答案(1)|浏览(147)

我试图找出特定字符串在外部PDF文件中重复的频率。我无法使用扫描仪扫描pdf文档,并弹出一个我不理解的错误。这是我目前拥有的代码:

package Files;

import org.apache.pdfbox.Loader;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.PDFTextStripperByArea;

import java.io.File;
import java.io.IOException;
import java.util.Scanner;

public class StringSearcherTest {
    public static void main(String[] args) throws IOException {
        Scanner searcher = new Scanner(System.in);
        Boolean foundString = false;

        System.out.println("Enter the word you would like to search for");
        String word = searcher.nextLine(); 
        word = word.trim();
        int count = 0;
        String phrase = ("The text that will be read is the following:");
        System.out.println(phrase.toUpperCase() + "\n");

        try (PDDocument document = Loader.loadPDF(new File("/Users/ricardobarahona/Desktop/LifeStyle Steps.pdf"))) {
            document.getClass();

            if (!document.isEncrypted()) {
                PDFTextStripperByArea stripper = new PDFTextStripperByArea();
                stripper.setSortByPosition(true);

                PDFTextStripper tStripper = new PDFTextStripper();
                String pdfFileInText = tStripper.getText(document);
                System.out.println("Text:" + pdfFileInText);

            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        Scanner sc2 = new Scanner((Readable) Loader.loadPDF(new File("/Users/ricardobarahona/Desktop/LifeStyle Steps.pdf")));

    }
}

IDE不会反映代码中的任何错误,但是当我运行程序时,PDF会打印出来,第二台扫描仪扫描文档的时间到了,这个错误会弹出:
线程“main”java.lang.ClassCastException:类org.apache.pdfbox.pdmodel.PDDocument无法强制转换为Files.StringSearcherTest.main(StringSeaarcherTest.java:40)处的类java.lang.Readable

qvtsj1bj

qvtsj1bj1#

尝试将每次出现的FileInputStream更改为File,并将目录从文件夹更改为实际文件,例如"C:\text\text2\file.txt"。它应该可以消除错误,并且应该能够正确读取文件

相关问题