我试图找出特定字符串在外部PDF文件中重复的频率。我无法使用扫描仪扫描pdf文档,并弹出一个我不理解的错误。这是我目前拥有的代码:
package Files;
import org.apache.pdfbox.Loader;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.PDFTextStripperByArea;
import java.io.File;
import java.io.IOException;
import java.util.Scanner;
public class StringSearcherTest {
public static void main(String[] args) throws IOException {
Scanner searcher = new Scanner(System.in);
Boolean foundString = false;
System.out.println("Enter the word you would like to search for");
String word = searcher.nextLine();
word = word.trim();
int count = 0;
String phrase = ("The text that will be read is the following:");
System.out.println(phrase.toUpperCase() + "\n");
try (PDDocument document = Loader.loadPDF(new File("/Users/ricardobarahona/Desktop/LifeStyle Steps.pdf"))) {
document.getClass();
if (!document.isEncrypted()) {
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
PDFTextStripper tStripper = new PDFTextStripper();
String pdfFileInText = tStripper.getText(document);
System.out.println("Text:" + pdfFileInText);
}
} catch (IOException e) {
e.printStackTrace();
}
Scanner sc2 = new Scanner((Readable) Loader.loadPDF(new File("/Users/ricardobarahona/Desktop/LifeStyle Steps.pdf")));
}
}
IDE不会反映代码中的任何错误,但是当我运行程序时,PDF会打印出来,第二台扫描仪扫描文档的时间到了,这个错误会弹出:
线程“main”java.lang.ClassCastException:类org.apache.pdfbox.pdmodel.PDDocument无法强制转换为Files.StringSearcherTest.main(StringSeaarcherTest.java:40)处的类java.lang.Readable
1条答案
按热度按时间qvtsj1bj1#
尝试将每次出现的
FileInputStream
更改为File
,并将目录从文件夹更改为实际文件,例如"C:\text\text2\file.txt"
。它应该可以消除错误,并且应该能够正确读取文件