我使用的是JDK7和我的jena库版本-2.11.1
下面是我的示例三倍数据文件名rdf.nt
<http://sce.umkc.edu/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Ontology> .
<http://sce.umkc.edu/> <http://www.w3.org/2002/07/owl#imports> <http://purl.uniprot.org/core/> .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.uniprot.org/core/Protein> .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://purl.uniprot.org/core/reviewed> <true"^^<http://www.w3.org/2001/XMLSchema#boolean> .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://purl.uniprot.org/core/created> <2011-06-28"^^<http://www.w3.org/2001/XMLSchema#date> .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://purl.uniprot.org/core/modified> <2011-07-27"^^<http://www.w3.org/2001/XMLSchema#date> .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://purl.uniprot.org/core/version> <22"^^<http://www.w3.org/2001/XMLSchema#int> .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://purl.uniprot.org/core/mnemonic> <001R_FRG3G" .
<http://purl.uniprot.org/uniprot/Q6GZX4> <http://purl.uniprot.org/core/citation> <http://purl.uniprot.org/citations/15165820> .
<http://sce.umkc.edu/#_5136475A5834001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement> .
<http://sce.umkc.edu/#_5136475A5834001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject> <http://purl.uniprot.org/uniprot/Q6GZX4> .
<http://sce.umkc.edu/#_5136475A5834001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate> <http://purl.uniprot.org/core/citation> .
我的java代码
public class ReadRDF {
public static void main(String args[]) {
String inputFileName = "Rdf.nt";
// use the FileManager to find the input file
Model model = FileManager.get().loadModel(inputFileName, null,
"N-TRIPLES");
model.write(System.out, "TRIPLES");
}
}
错误
Exception in thread "main" org.apache.jena.riot.RiotException: [line: 4, col: 91] Broken IRI (bad character: '<'): true"^^
at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:136)
at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:163)
at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:106)
at org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:67)
at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:54)
at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:42)
at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTFactoryImpl$1.read(RDFParserRegistry.java:142)
at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:859)
at org.apache.jena.riot.RDFDataMgr.parse(RDFDataMgr.java:687)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:208)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:141)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:130)
at org.apache.jena.riot.adapters.AdapterFileManager.readModelWorker(AdapterFileManager.java:291)
at com.hp.hpl.jena.util.FileManager.loadModelWorker(FileManager.java:333)
at com.hp.hpl.jena.util.FileManager.loadModel(FileManager.java:320)
at com.jena.main.ReadRDF.main(ReadRDF.java:10)
请帮助我阅读这些数据以及如何将rdf数据存储到hbase数据库中。
如何忽略坏字符:“<”因为我的文件中有超过100万条记录如果我要更改每条记录需要很长时间,请建议另一种选择
2条答案
按热度按时间ubby3x7f1#
是您的数据被破坏了,您需要修复@user205512在其推荐中已经指出的错误,然后才能取得任何进展。
另一件要意识到的是,没有像
N-TURTLES
,你可能的意思是N-TRIPLES
.您的代码可能只起作用,因为jena忽略了未知语言,而是从文件扩展名检测输入格式。
9o685dep2#
您的数据不正确:
不是字面上的“真”^^http://www.w3.org/2001/xmlschema#boolean
对于我猜应该是文字的东西,还有很多其他的错误。