无法从其文件中读取rdf三倍数据,是否出现异常?

abithluo  于 2021-06-09  发布在  Hbase
关注(0)|答案(2)|浏览(325)

我使用的是JDK7和我的jena库版本-2.11.1
下面是我的示例三倍数据文件名rdf.nt

<http://sce.umkc.edu/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Ontology> .
<http://sce.umkc.edu/> <http://www.w3.org/2002/07/owl#imports> <http://purl.uniprot.org/core/> .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.uniprot.org/core/Protein> .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://purl.uniprot.org/core/reviewed> <true"^^<http://www.w3.org/2001/XMLSchema#boolean> .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://purl.uniprot.org/core/created> <2011-06-28"^^<http://www.w3.org/2001/XMLSchema#date> .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://purl.uniprot.org/core/modified> <2011-07-27"^^<http://www.w3.org/2001/XMLSchema#date> .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://purl.uniprot.org/core/version> <22"^^<http://www.w3.org/2001/XMLSchema#int> .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://purl.uniprot.org/core/mnemonic> <001R_FRG3G" .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://purl.uniprot.org/core/citation> <http://purl.uniprot.org/citations/15165820> .
<http://sce.umkc.edu/#_5136475A5834001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement> .
<http://sce.umkc.edu/#_5136475A5834001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject> <http://purl.uniprot.org/uniprot/Q6GZX4> .
<http://sce.umkc.edu/#_5136475A5834001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate> <http://purl.uniprot.org/core/citation> .

我的java代码

public class ReadRDF {
    public static void main(String args[]) {
        String inputFileName = "Rdf.nt";
        // use the FileManager to find the input file
        Model model = FileManager.get().loadModel(inputFileName, null,
                "N-TRIPLES");
        model.write(System.out, "TRIPLES");

    }
}

错误

Exception in thread "main" org.apache.jena.riot.RiotException: [line: 4, col: 91] Broken IRI (bad character: '<'): true"^^
    at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:136)
    at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:163)
    at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:106)
    at org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:67)
    at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:54)
    at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:42)
    at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTFactoryImpl$1.read(RDFParserRegistry.java:142)
    at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:859)
    at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:687)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:208)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:141)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:130)
    at org.apache.jena.riot.adapters.AdapterFileManager.readModelWorker(AdapterFileManager.java:291)
    at com.hp.hpl.jena.util.FileManager.loadModelWorker(FileManager.java:333)
    at com.hp.hpl.jena.util.FileManager.loadModel(FileManager.java:320)
    at com.jena.main.ReadRDF.main(ReadRDF.java:10)

请帮助我阅读这些数据以及如何将rdf数据存储到hbase数据库中。
如何忽略坏字符:“<”因为我的文件中有超过100万条记录如果我要更改每条记录需要很长时间,请建议另一种选择

ubby3x7f

ubby3x7f1#

是您的数据被破坏了,您需要修复@user205512在其推荐中已经指出的错误,然后才能取得任何进展。
另一件要意识到的是,没有像 N-TURTLES ,你可能的意思是 N-TRIPLES .
您的代码可能只起作用,因为jena忽略了未知语言,而是从文件扩展名检测输入格式。

9o685dep

9o685dep2#

您的数据不正确:

<true"^^<http://www.w3.org/2001/XMLSchema#boolean>

不是字面上的“真”^^http://www.w3.org/2001/xmlschema#boolean
对于我猜应该是文字的东西,还有很多其他的错误。

相关问题