java PDFBox印前检查分析器无法检测PDF/A-1b文件

e3bfsja2  于 2023-05-27  发布在  Java
关注(0)|答案(1)|浏览(210)

我正在使用以下代码来检测一个文件是否是PDF/A-1b文件?

public boolean isPDF_A1BFile(File file) throws IOException {
        PreflightParser parser = new PreflightParser(file);
        parser.parse(Format.PDF_A1B);
        PreflightDocument preflightDocument = parser.getPreflightDocument();
        preflightDocument.validate();

        ValidationResult validationResult = preflightDocument.getResult();
        
        return validationResult.isValid(); //Return false in every case
    }

但无论文件是否为PDF/A-1b,它总是返回false。我用的是pdf/a-1b file。我已经验证了使用preflight工具在acrobat和它说,该文件是PDF/A-1b的合规性。分享samex 1c 0d1x的截图有人能告诉我我的代码中有什么问题吗?或者我错过了什么?
此外,是否有任何方法可以检查文件是否符合PDF/A-2B

svgewumm

svgewumm1#

该文件是容忍的一些PDF应用程序,因为许多将修复这种差异,但PDF框是检测到许多奇怪的,我没有试图花太多时间,但评论似乎潜在的有效,因此该文件是潜在的不符合。

The file Doc1-withHelvetica-pdfa1b.pdf is not a valid PDF/A-1b file, error(s) :
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 32264 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Length}:COSInt{8702};COSName{Subtype}:COSName{XML};COSName{Type}:COSName{Metadata};}; defined length=8702; actual length=8702, starting offset=23561
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 35134 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Filter}:COSName{FlateDecode};COSName{Length}:COSInt{2574};COSName{N}:COSInt{3};COSName{Range}:COSArray{COSFloat{0.0};COSFloat{1.0};0;1065353216;0;1065353216;};}; defined length=2574; actual length=2574, starting offset=32559
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 1562 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Filter}:COSName{FlateDecode};COSName{Length}:COSInt{202};}; defined length=202; actual length=202, starting offset=1359
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 4486 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Alternate}:COSName{DeviceRGB};COSName{Filter}:COSName{FlateDecode};COSName{Length}:COSInt{2612};COSName{N}:COSInt{3};}; defined length=2612; actual length=2612, starting offset=1873
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 4640 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Filter}:COSName{FlateDecode};COSName{Length}:COSInt{17};}; defined length=17; actual length=17, starting offset=4622
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 15067 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Filter}:COSName{FlateDecode};COSName{Length}:COSInt{10342};COSName{Length1}:COSInt{27968};}; defined length=10342; actual length=10342, starting offset=4724
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 16081 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Filter}:COSName{FlateDecode};COSName{Length}:COSInt{407};}; defined length=407; actual length=407, starting offset=15673
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 22792 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Filter}:COSName{FlateDecode};COSName{Length}:COSInt{6627};COSName{Length1}:COSInt{15080};}; defined length=6627; actual length=6627, starting offset=16164
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 23435 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Filter}:COSName{FlateDecode};COSName{Length}:COSInt{355};}; defined length=355; actual length=355, starting offset=23079
1.2.2 : Body Syntax error, Expected 'EOL' before the endstream keyword at offset 822 but found '101'
1.2.5 : Body Syntax error, Stream length is invalid [dic=COSDictionary{COSName{Filter}:COSName{FlateDecode};COSName{I}:COSInt{93};COSName{Length}:COSInt{85};COSName{S}:COSInt{39};}; defined length=85; actual length=85, starting offset=736

因此,从表面上看,我只是在MuPDF中使用“clean”重建文件,并在PDF框中重新运行验证。
C:\Apps\PDF\inspectors\Apache\preflight-app-3.0.0-alpha3.jar Doc1-withHelvetica-pdfa1ba.pdf
文件Doc1-withHelvetica-pdfa1ba.pdf是有效的PDF/A-1b文件
然而,catch 22现在在报告时会使其他验证失败
PDF结构已损坏,但已修复。根据损坏的程度,理论上可能会丢失一些数据(尽管通常不太可能)。
因此,通过删除PDF/A兼容性回收,并通过重新生成为PDF/A,看看有什么问题,现在的报告是至少有1个坏的字体定义Calibri(并不奇怪,因为它以前是一个word文档打印输出。)什么是不明显的是有一个流氓Calibri空格字符在该行的末尾,包含Helvetica粗体和删除,然后报告其他问题,所以另一个运行通过编辑器,最后与所有的糟粕删除,双方都同意没有更多的问题。

相关问题