我有一个处理orc文件的Map程序。从驱动程序中,我将orcformat设置为输入格式。
job.setInputFormatClass(OrcNewInputFormat.class);
在orcnewinputformat中,值是orcstruct。在map方法中,可写值作为参数(value param)传递,它的类型转换为Map中的orcstruct,如下所示。
OrcStruct record = (OrcStruct) value
我想用mrunit测试这个Map器。为此,在unittest的setup方法中,我在testfilepath中创建了一个orc文件
OrcFile.createWriter(testFilePath, OrcFile.writerOptions(conf).inspector(inspector).stripeSize(100000).bufferSize(10000).version(OrcFile.Version.V_0_12));
writer.addRow(new SimpleStruct("k1", "v1")) ;
public static class SimpleStruct {
Text k;
Text string1;
SimpleStruct(String b1, String s1) {
this.k = new Text(b1);
if (s1 == null) {
this.string1 = null;
} else {
this.string1 = new Text(s1);
}
}
}
然后在测试方法中,我阅读了它并使用mrunit invoke mapper。下面是代码
// Read orc file
Reader reader = OrcFile.createReader(fs, testFilePath) ;
RecordReader recordRdr = reader.rows() ;
OrcStruct row = null ;
List<OrcStruct> mapData = new ArrayList<>()
while(recordRdr.hasNext()) {
row = (OrcStruct) recordRdr.next(row) ;
mapData.add(row) ;
}
// test mapper
initializeSerde(mapDriver.getConfiguration());
Writable writable = getWritable(mapData.get(0)) ; // test 1st record's mapper processing
mapDriver.withCacheFile(strCachePath).withInput(NullWritable.get(), writable );
mapDriver.runTest();
但是在运行测试用例时,我发现了以下错误
java.lang.UnsupportedOperationException: can't write the bundle
at org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow.write(OrcSerde.java:61)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
at org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:80)
at org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:97)
at org.apache.hadoop.mrunit.internal.io.Serialization.copyWithConf(Serialization.java:110)
at org.apache.hadoop.mrunit.TestDriver.copy(TestDriver.java:675)
at org.apache.hadoop.mrunit.TestDriver.copyPair(TestDriver.java:679)
at org.apache.hadoop.mrunit.MapDriverBase.addInput(MapDriverBase.java:120)
at org.apache.hadoop.mrunit.MapDriverBase.withInput(MapDriverBase.java:210)
查看orcserde,我可以看到mrunit调用的write不受支持。因此测试用例出错。
如何对正在处理orc文件的Map器进行单元测试。在我的工作中有没有其他的方法或需要改变的地方?
事先谢谢你的帮助。
暂无答案!
目前还没有任何答案,快来回答吧!