在对包含70多列的配置单元表执行按日期分区时,我遇到了这个错误:
错误:状态:失败错误:vertex失败,vertexname=map 1,vertexid=vertex\u 1612203694878\u 0265\u 4\u 00,diagnostics=[任务失败,taskid=task\u 1612203694878\u 0265\u 4\u 00\u 000058,diagnostics=[任务尝试0失败,信息=[容器容器\u e16\u 1612203694878\u 0265\u 01\u 000167完成,诊断设置为[容器失败,exitcode=-104][2021-02-02 11:00:58.498]容器[pid=1577,containerid=container\u e16\u 1612203694878\u 0265\u 01\u 000167]正在运行3022848b,超出了“物理”内存限制。当前使用情况:使用1 gb物理内存中的1.0 gb;使用了2.7 gb的2.1 gb虚拟内存。杀人容器。容器的进程树转储\u e16 \u 1612203694878 \u 0265 \u 01 \u 000167:|-pid ppid pgrpid sessid cmd \u name user \u mode \u time(millis)system \u time(millis)vmem \u usage(bytes)rssmem \u usage(pages)full \u cmd \u line |-1577 1567 1577(bash)0 0 116011008 301/bin/bash-c/usr/jdk64/jdk1.8.0 \u 112/bin/java-xmx819m-server-djava.net.preferipv4stack=true-dhdp.version=3.1.4.0-315-xx:+printgc详细信息-verbose:gc -xx:+printgctimestamps-xx:+usenuma-xx:+useg1gc-xx:+resizetlab-server-djava.net.preferipv4stack=true-xx:newratio=8-xx:+usenuma-xx:+useg1gc-xx:+resizetlab-xx:+printgcdetails-verbose:gc -xx用法:+printgctimestamps-dlog4j.configuratorclass=org.apache.tez.common.tezlog4jconfigurator-dlog4j.configuration=tez-container-log4j.properties-dyarn.app.container.log.dir=/usr/hadoop/yarn/log/application\u 1612203694878\u 0265/container\u e16\u 1612203694878\u 0265\u 01\u 000167-dtez.root.logger=info,cla-djava.io.tmpdir=/usr/hadoop/yarn/local/usercache/hive/appcache/application\u 1612203694878\u 0265/container\u e16\u 1612203694878\u 0265\u 01\u 000167/tmp org.apache.tez.runtime.task.tezchild slave-06-n.fawryhq.corp 43250 container\u e16\u 1612203694878\u 0265\u 01\u 000167 application\u 1612203694878\u 0265 11> /usr/hadoop/yarn/log/application\u 1612203694878\u 0265/container\u e16\u 1612203694878\u 0265\u 01\u 000167/stdout 2>/usr/hadoop/yarn/log/application\u 1612203694878\u 0265/container\u e16\u 1612203694878\u 0265\u 01\u 000167/stderr |-1658 1577 1577(java)1414 128 2788896768 262581/usr/jdk64/jdk1.8.0\u 112/bin/java-x819m-server-djava.net.preferipv4stack=true-dhdp.version=3.1.4.0-315-xx:+printgc详细信息-verbose:gc -xx:+printgctimestamps-xx:+usenuma-xx:+useg1gc-xx:+resizetlab-server-djava.net.preferipv4stack=true-xx:newratio=8-xx:+usenuma-xx:+useg1gc-xx:+resizetlab-xx:+printgcdetails-verbose:gc -xx用法:+printgctimestamps-dlog4j.configuratorclass=org.apache.tez.common.tezlog4jconfigurator-dlog4j.configuration=tez-container-log4j.properties-dyarn.app.container.log.dir=/usr/hadoop/yarn/log/application\u 1612203694878\u 0265/container\u e16\u 1612203694878\u 0265\u 01\u 000167-dtez.root.logger=info,cla-djava.io.tmpdir=/usr/hadoop/yarn/local/usercache/hive/appcache/application\u 1612203694878\u 0265/container\u e16\u 1612203694878\u 0265\u 01\u 000167/tmp org.apache.tez.runtime.task.tezchild slave-06-n.fawryhq.corp 43250 container\u e16\u 1612203694878\u 0265\u 01\u 000167 application\u 1612203694878\u 0265 1[2021-02 11:00:58.512]应请求终止容器。出口代码为143[2021-02-02 11:00:58.521]集装箱出口代码为非零143。],任务尝试1失败,info=[错误:运行任务时出错(失败):java.lang.outofmemoryerror:java.nio.heapbytebuffer.(heapbytebuffer)处的java堆空间。java:57)在java.nio.bytebuffer.allocate(bytebuffer。java:335)位于org.apache.tez.runtime.library.common.sort.impl.pipelinedsorter.allocatespace(pipelinedsorter)。java:256)在org.apache.tez.runtime.library.common.sort.impl.pipelinedsorter.(pipelinedsorter。java:205)在org.apache.tez.runtime.library.output.orderedpartitionedkvoutput.start(orderedpartitionedkvoutput)。java:146)在org.apache.hadoop.hive.ql.exec.tez.maprecordprocessor.init(maprecordprocessor。java:193)在org.apache.hadoop.hive.ql.exec.tez.tezprocessor.initializeandrunprocessor(tezprocessor)。java:266)在org.apache.hadoop.hive.ql.exec.tez.tezprocessor.run(tezprocessor。java:250)位于org.apache.tez.runtime.logicalioprocessorruntimetask.run(logicalioprocessorruntimetask)。java:374)在org.apache.tez.runtime.task.taskrunner2callable$1.run(taskrunner2callable。java:73)在org.apache.tez.runtime.task.taskrunner2callable$1.run(taskrunner2callable。java:61)位于javax.security.auth.subject.doas(subject)的java.security.accesscontroller.doprivileged(本机方法)。java:422)在org.apache.hadoop.security.usergroupinformation.doas(用户组信息。java:1730)在org.apache.tez.runtime.task.taskrunner2callable.callinternal(taskrunner2callable。java:61)在org.apache.tez.runtime.task.taskrunner2callable.callinternal(taskrunner2callable。java:37)在org.apache.tez.common.callablewithndc.call(callablewithndc。java:36)在com.google.common.util.concurrent.trustedlistenablefuturetask$trustedFutureInterruptableTask.runInterruptbly(trustedlistenablefuturetask)。java:125)在com.google.common.util.concurrent.interruptibletask.run(interruptibletask。java:69)在com.google.common.util.concurrent.trustedlistenablefuturetask.run(trustedlistenablefuturetask。java:78)在java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor。java:1142)在java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor。java:617)在java.lang.thread.run(线程。java:745) ,errormessage=无法从中恢复error:java.lang.outofmemoryerror:java.nio.heapbytebuffer处的java堆空间。(heapbytebuffer。java:57)在java.nio.bytebuffer.allocate(bytebuffer。java:335)位于org.apache.tez.runtime.library.common.sort.impl.pipelinedsorter.allocatespace(pipelinedsorter)。java:256)在org.apache.tez.runtime.library.common.sort.impl.pipelinedsorter.(pipelinedsorter。java:205)在org.apache.tez.runtime.library.output.orderedpartitionedkvoutput.start(orderedpartitionedkvoutput)。java:146)在org.apache.hadoop.hive.ql.exec.tez.maprecordprocessor.init(maprecordprocessor。java:193)在org.apache.hadoop.hive.ql.exec.tez.tezprocessor.initializeandrunprocessor(tezprocessor)。java:266)在org.apache.hadoop.hive.ql.exec.tez.tezprocessor.run(tezprocessor。java:250)位于org.apache.tez.runtime.logicalioprocessorruntimetask.run(logicalioprocessorruntimetask)。java:374)在org.apache.tez.runtime.task.taskrunner2callable$1.run(taskrunner2callable。java:73)在org.apache.tez.runtime.task.taskrunner2callable$1.run(taskrunner2callable。java:61)位于javax.security.auth.subject.doas(subject)的java.security.accesscontroller.doprivileged(本机方法)。java:422)在org.apache.hadoop.security.usergroupinformation.doas(用户组信息。java:1730)在org.apache.tez.runtime.task.taskrunner2callable.callinternal(taskrunner2callable。java:61)在org.apache.tez.runtime.task.taskrunner2callable.callinternal(taskrunner2callable。java:37)在org.apache.tez.common.callablewithndc.call(callablewithndc。java:36)在com.google.common.util.concurrent.trustedlistenablefuturetask$trustedFutureInterruptableTask.runInterruptbly(trustedlistenablefuturetask)。java:125)在com.google.common.util.concurrent.interruptibletask.run(interruptibletask。java:69)在com.google.common.util.concurrent.trustedlistenablefuturetask.run(trustedlistenablefuturetask。java:78)在java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor。java:1142)在java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor。java:617)在java.lang.thread.run(线程。java:745)]],由于自身任务失败,vertex未成功,failedtasks:1 killedtasks:17,vertex vertex_1612203694878_0265_4_00[map 1]已终止/失败,原因是:own_task_failure]错误:vertex已终止,vertexname=reducer 2,vertexid=vertex_1612203694878_0265_4_01,diagnostics=[vertex在运行状态下收到终止,vertex因其他\u vertex_失败而未成功,failedtasks:0 killedtasks:2,vertex\u 1612203694878\u 0265\u 4\u 01[减速器2]已终止/失败,原因是:其他\u vertex\u失败]错误:由于vertex\u失败,dag未成功。失败dvertices:1 killedvertices:1
1条答案
按热度按时间3xiyfsfu1#
试一下(按这个顺序)
增加Map器并行度。目标是得到更多更小的Map绘制者。检查有多少Map绘制开始和调整数字。如果有太大的不可拆分格式(如gzip)的文件,这将不起作用,请继续下面的两个步骤。
禁用Map端聚合(Map端聚合通常会导致oom)
如果以上两个步骤没有帮助,请增加Map器内存(尝试查找最小工作容器大小)