如何调试多次重试后Map作业失败的原因

qhhrdooz  于 2021-06-08  发布在  Hbase
关注(0)|答案(1)|浏览(355)

我编写了一个mapreduce作业,在特定的时间范围内扫描hbase表,以计算分析所需的某些元素。
Map绘制员的工作总是失败,但我不知道为什么。似乎每次我运行作业时,都有不同数量的Map器失败。cloudera manager的yarn日志(见下文)没有帮助指出问题所在,不过,有人说我可能内存不足。
它似乎重试多次,但每次都失败。我需要做些什么来让它停止失败,或者如何记录一些事情来帮助我更好地确定发生了什么?
下面是一个失败的Map绘制者从Yarn日志。
错误:org.apache.hadoop.hbase.client.retriesexhaustedexception:尝试后失败=36,异常:2017年6月15日星期四16:26:57 pdt,null,java.net.sockettimeoutexception:calltimeout=60000,callduration=60301:行'152\u p3401.db161139.sjc102.dbi\u 1496271480',表'dbi\u-based\u-data',151\u p3413.db162024.iad4.dbi\u 1476974340,1486675565213.d83250d0682e648d165872afe5abd60e.,主机名=hslave35118.ams9.mysecretdomain.com,600201483570489305,seqnum=19308931,位于org.apache.hadoop.hbase.client.rpcretryingcallerwithreadreplicas.throwenrichedexception(rpcretryingcallerwithreadreplicas)。java:276)在org.apache.hadoop.hbase.client.scannercallablewithreplicas.call(scannercallablewithreplicas。java:207)在org.apache.hadoop.hbase.client.scannercallablewithreplicas.call(scannercallablewithreplicas。java:60)位于org.apache.hadoop.hbase.client.rpcretryingcaller.callwithoutretries(rpcretryingcaller)。java:200)在org.apache.hadoop.hbase.client.clientscanner.call(clientscanner。java:320)在org.apache.hadoop.hbase.client.clientscanner.loadcache(客户端扫描程序)。java:403)在org.apache.hadoop.hbase.client.clientscanner.next(clientscanner。java:364)在org.apache.hadoop.hbase.mapreduce.tablerecordreaderimpl.nextkeyvalue(tablerecordreaderimpl。java:236)在org.apache.hadoop.hbase.mapreduce.tablerecordreader.nextkeyvalue(tablerecordreader。java:147)在org.apache.hadoop.hbase.mapreduce.tableinputformatbase$1.nextkeyvalue(tableinputformatbase。java:216)在org.apache.hadoop.mapred.maptask$newtrackingrecordreader.nextkeyvalue(maptask。java:556)在org.apache.hadoop.mapreduce.task.mapcontextimpl.nextkeyvalue(mapcontextimpl。java:80)在org.apache.hadoop.mapreduce.lib.map.wrappedmapper$context.nextkeyvalue(wrappedmapper。java:91)在org.apache.hadoop.mapreduce.mapper.run(mapper。java:144)在org.apache.hadoop.mapred.maptask.runnewmapper(maptask。java:787)在org.apache.hadoop.mapred.maptask.run(maptask。java:341)在org.apache.hadoop.mapred.yarnchild$2.run(yarnchild。java:164)位于javax.security.auth.subject.doas(subject)的java.security.accesscontroller.doprivileged(本机方法)。java:415)在org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation。java:1693)在org.apache.hadoop.mapred.yarnchild.main(yarnchild。java:158)原因:java.net.sockettimeoutexception:calltimeout=60000,callduration=60301:表'dbi\u-based\u-data'上的行'152\u p3401.db161139.sjc102.dbi\u 1496271480',位于region=dbi\u-based\u-data,151\u p3413.db162024.iad4.dbi\u 147697434014866755213.d83250d0682e648d16587afe5abd60e。,主机名=hslave35118.ams9.mysecretdomain.com,600201483570489305,seqnum=19308931,位于org.apache.hadoop.hbase.client.rpcretryingcaller.callwithretries(rpcretryingcaller)。java:159)在org.apache.hadoop.hbase.client.resultboundedcompletionservice$queueingfuture.run(resultboundedcompletionservice。java:65)位于java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor。java:1145)在java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor。java:615)在java.lang.thread.run(线程。java:745)原因:java.io.ioexception:调用hslave35118.ams9.mysecretdomain.com/10.216.35.118:60020本地异常失败:org.apache.hadoop.hbase.ipc.calltimeoutexception:调用id=12,waittime=60001,operationtimeout=60000过期。在org.apache.hadoop.hbase.ipc.abstractrpcclient.wrapexception(abstractrpcclient。java:291)在org.apache.hadoop.hbase.ipc.rpcclientimpl.call(rpcclientimpl。java:1272)在org.apache.hadoop.hbase.ipc.abstractrpcclient.callblockingmethod(abstractrpcclient。java:226)在org.apache.hadoop.hbase.ipc.abstractrpcclient$blockingrpcchannelimplementation.callblockingmethod(abstractrpcclient)。java:331)在org.apache.hadoop.hbase.protobuf.generated.clientprotos$clientservice$blockingstub.scan(clientprotos。java:34094)在org.apache.hadoop.hbase.client.scannercallable.call(scannercallable。java:219)在org.apache.hadoop.hbase.client.scannercallable.call(scannercallable。java:64)位于org.apache.hadoop.hbase.client.rpcretryingcaller.callwithoutretries(rpcretryingcaller)。java:200)在org.apache.hadoop.hbase.client.scannercallablewithreplicas$retryingrpc.call(scannercallablewithreplicas。java:360)在org.apache.hadoop.hbase.client.scannercallablewithreplicas$retryingrpc.call(scannercallablewithreplicas)。java:334)在org.apache.hadoop.hbase.client.rpcretryingcaller.callwithretries(rpcretryingcaller。java:126) ... 4其他原因:org.apache.hadoop.hbase.ipc.calltimeoutexception:调用id=12,waittime=60001,operationtimeout=60000过期。在org.apache.hadoop.hbase.ipc.call.checkandsettimeout(call。java:73)在org.apache.hadoop.hbase.ipc.rpcclientimpl.call(rpcclientimpl。java:1246) ... 13个以上

um6iljoc

um6iljoc1#

所以对于我的情况,我需要延长超时设置。在我的java程序中,我必须添加以下行以消除异常:

conf.set("hbase.rpc.timeout","90000");
    conf.set("hbase.client.scanner.timeout.period","90000");

答案是在cloudera网站的这个链接上找到的

相关问题