pig错误1066,后端错误:-1;NegativeAraySizeException异常;自定义项、joda时间、hbase

xt0899hw  于 2021-06-09  发布在  Hbase
关注(0)|答案(2)|浏览(374)

我得到了一个例外,从Pig脚本,并没有能够确定的原因。我对pig还比较陌生&根据我得到的例外情况搜索了各种主题,但没有找到任何有意义的东西。从gruntshell&log中,我找到了不同的变体-无法读取清单文件java.lang.negativearraysizeexception:-1错误1066:无法打开别名f的迭代器。后端错误:-1
我使用的是hadoopversion2.0.0-cdh4.6.0和pigversion0.11.0,运行于gruntshell。
我的pig脚本读取一个文件,对数据进行一些操作(包括调用javaudf),连接到hbase表,然后转储输出。很简单。我可以转储中间结果(别名b),数据看起来很好。
我已经用同一个输入文件测试了pig的java函数,看到它返回了我所期望的值,并且在pig脚本之外本地测试了该函数。java函数提供了从01-01-1900开始的若干天&使用jodatimev2.7返回datetime。最初,udf接受一个元组作为输入。我尝试过将udf输入格式改为byte,最近一次是string,并在pig返回时转换为datetime,但仍然得到相同的错误。
当我改变我的Pig脚本只是不调用自定义项,它的工作很好。negativearray错误听起来像是数据在转储时出了问题,可能是因为某种格式问题,但我不知道怎么回事。
Pig手稿

A = LOAD 'tst2_SplitGroupMax.txt' using PigStorage(',')  
as (id:bytearray, year:int, doy:int, month:int, dayOfMonth:int,  
 awh_minTemp:double, awh_maxTemp:double,  
 nws_minTemp:double, nws_maxTemp:double,  
 wxs_minTemp:double, wxs_maxTemp:double,  
 tcc_minTemp:double, tcc_maxTemp:double  
 ) ;  

register /import/pool2/home/NA1000APP-TPSDM/ejbles/Test-0.0.1-SNAPSHOT-jar-with-dependencies.jar;  

B = FOREACH A GENERATE id as msmtid, SUBSTRING(id,0,8) as gridid, SUBSTRING(id,9,20) as msmt_days,  
 year, doy, month, dayOfMonth,  
 CONCAT(CONCAT(CONCAT((chararray)year,'-'),CONCAT((chararray)month,'-')),(chararray)dayOfMonth) as msmt_dt,  
 ToDate(monutil.geoloc.GridIDtoDatetime(id)) as func_msmt_dt,  
 awh_minTemp, awh_maxTemp,  
 nws_minTemp, nws_maxTemp,  
 wxs_minTemp, wxs_maxTemp,  
 tcc_minTemp, tcc_maxTemp  
 ;  

E = LOAD 'hbase://wxgrid_detail' using org.apache.pig.backend.hadoop.hbase.HBaseStorage  
 ('loc:country, loc:fips, loc:l1 ,loc:l2, loc:latitude, loc:longitude',  
 '-loadKey=true -caster=HBaseBinaryConverter')  
 as (wxgrid:bytearray, country:chararray, fips:chararray, l1:chararray, l2:chararray,  
   latitude:double, longitude:double);  

F = join B by gridid, E by wxgrid;  

DUMP F;  --- This is where I get the exception

这是从咕噜壳中返回的内容摘录-
2015-06-15 12:23:24204[main]警告org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher-ooops!有些工作失败了!如果希望pig在发生故障时立即停止,请指定-stop\u on\u failure。2015-06-15 12:23:24205[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher-作业\u 201502081759 \u 916870失败!停止运行所有相关作业2015-06-15 12:23:24205[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher-100%完成2015-06-15 12:23:24221[main]error org.apache.pig.tools.pigstats.simplepigstats-error:-1 2015-06-15 12:23:24,221[main]error org.apache.pig.tools.pigstats.pigstatutil-1 map reduce作业失败!2015-06-15 12:23:24223[main]警告org.apache.pig.tools.pigstats.scriptstate-无法读取pigs清单文件2015-06-15 12:23:24224[main]信息org.apache.pig.tools.pigstats.simplepigstats-脚本统计:
hadoopversion pigversion userid起始于finishedat
功能2.0.0-cdh4.6.0 na1000app tpsdm 2015-06-15 12:22:39 2015-06-15 12:23:24散列连接
失败!
失败的作业:jobid alias feature message outputs job\u 201502081759\u 916870 a,b,e,f hash\u join message:作业失败!
hdfs://nameservice1/tmp/temp-238648079/tmp-1338617620,
输入:未能从“”读取数据hbase://wxgrid_detail“读取数据失败”hdfs://nameservice1/user/na1000app-tpsdm/tst2_splitgroupmax.txt"
输出:未能在“”中生成结果hdfs://nameservice1/tmp/temp-238648079/tmp-1338617620"
计数器:写入的记录总数:写入的字节总数:0可溢出内存管理器溢出计数:0主动溢出的包总数:0主动溢出的记录总数:0
工作日:工作号:201502081759工作号:916870
2015-06-15 12:23:24224[main]info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher-失败!2015-06-15 12:23:24234[main]错误org.apache.pig.tools.grunt.grunt-错误1066:无法打开别名f的迭代器。后端错误:-1日志文件中的详细信息:/import/pool2/home/na1000app tpsdm/ejbles/pig_.log
这是日志-
后端错误消息

uyhoqukh

uyhoqukh1#

---------java.lang.negativearraysizeexception:-1位于org.apache.hadoop.hbase.util.bytes.readbytearray(bytes)。java:148)在org.apache.hadoop.hbase.mapreduce.tablesplit.readfields(tablesplit。java:133)在org.apache.hadoop.io.serializer.writableserialization$writabledeserializer.deserialize(可写序列化)。java:73)位于org.apache.hadoop.io.serializer.writeableserialization$writeabledeserializer.deserialize(writeableserialization)。java:44)在org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigsplit.readfields(pigsplit。java:233)在org.apache.hadoop.io.serializer.writableserialization$writabledeserializer.deserialize(可写序列化)。java:73)位于org.apache.hadoop.io.serializer.writeableserialization$writeabledeserializer.deserialize(writeableserialization)。java:44)在org.apache.hadoop.mapred.maptask.getsplitdetails(maptask。java:356)在org.apache.hadoop.mapred.maptask.runnewmapper(maptask。java:640)在org.apache.hadoop.mapred.maptask.run(maptask。java:330)在org.apache.hadoop.mapred.child$4.run(ch
pig堆栈跟踪

ej83mcc0

ej83mcc02#

------错误1066:无法打开别名f的迭代器。后端错误:-1
org.apache.pig.impl.logicalayer.frontendexception:错误1066:无法打开别名f的迭代器。后端错误:-1位于org.apache.pig.pigserver.openiterator(pigserver。java:828)位于org.apache.pig.tools.grunt.gruntparser.processdump(gruntparser。java:696)在org.apache.pig.tools.pigscript.parser.pigscriptparser.parse(pigscriptparser。java:320)在org.apache.pig.tools.grunt.gruntparser.parsestoponerror(gruntparser。java:194)在org.apache.pig.tools.grunt.gruntparser.parsestoponerror(gruntparser。java:170)在org.apache.pig.tools.grunt.grunt.run(grunt。java:69)在org.apache.pig.main.run(main。java:538)在org.apache.pig.main.main(main。java:157)位于sun.reflect.nativemethodaccessorimpl.invoke0(本机方法)sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl。java:39)在sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl。java:25)在java.lang.reflect.method.invoke(方法。java:597)在org.apache.hadoop.util.runjar.main(runjar。java:208)原因:java.lang.negativearraysizeexception:-1 atorg.apache.hadoop.hbase.util.bytes.readbytearray(字节。java:148)在org.apache.hadoop.hbase.mapreduce.tablesplit.readfields(tablesplit。java:133)位于org.apache.hadoop.io.serializer.writeableserialization$writeabledeserializer.deserialize(writeableserialization)。java:73)在org.apache.hadoop.io.serializer.writableserialization$writabledeserializer.deserialize(可写序列化)。java:44)在org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigsplit.readfields(pigsplit。java:233)位于org.apache.hadoop.io.serializer.writeableserialization$writeabledeserializer.deserialize(writeableserialization)。java:73)在org.apache.hadoop.io.serializer.writableserialization$writabledeserializer.deserialize(可写序列化)。java:44)在org.apache.hadoop.mapred.maptask.getsplitdetails(maptask。java:356)在org.apache.hadoop.mapred.maptask.runnewmapper(maptask。java:640)在org.apache.hadoop.mapred.maptask.run(maptask。java:330)

相关问题