pig脚本偶尔会出现outofmemoryexception失败

ssm49v7z 于 2021-06-24 发布在 Pig

关注(0)|答案(1)|浏览(499)

我有一个pig脚本在emr集群（emr-5.4.0）上运行，它使用自定义udf。自定义项用于查找某些维度数据，它会为这些维度数据导入（有点）大量的文本数据。
在pig脚本中，自定义项的用法如下：

DEFINE LookupInteger com.ourcompany.LookupInteger(<some parameters>);

自定义项将一些数据存储在 Map<Integer, Integer> 在某些输入数据上，聚合失败，出现以下异常

java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.lang.String.split(String.java:2377)
    at java.lang.String.split(String.java:2422)
    [...]
    at com.ourcompany.LocalFileUtil.toMap(LocalFileUtil.java:71)
    at com.ourcompany.LookupInteger.exec(LookupInteger.java:46)
    at com.ourcompany.LookupInteger.exec(LookupInteger.java:19)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextInteger(POUserFunc.java:379)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:347)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POBinCond.genericGetNext(POBinCond.java:76)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POBinCond.getNextInteger(POBinCond.java:118)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:347)

当使用运行pig聚合时，不会发生这种情况 mapreduce ，所以我们的解决方法是 pig -t tez 与 pig -t mapreduce .
由于我是amazonemr的新手，也是tez的新手，我希望能得到一些关于如何分析或调试问题的提示。
编辑：在tez堆栈上运行pig脚本时，这看起来像是一个奇怪的运行时行为。
请注意，pig脚本正在使用
复制连接（要连接的较小关系需要放入内存中）和
前面提到的udf，正在初始化 Map<Integer, Interger> 产生上述outofmemoryerror。

apache-pig emr tez

来源：https://stackoverflow.com/questions/48437768/pig-script-on-aws-emr-with-tez-occasionally-fails-with-outofmemoryexception

1条答案

按热度按时间

7hiiyaii1#

我们找到了另一个使用tez后端的解决方法。使用增加的值 mapreduce.map.memory.mb 以及 mapreduce.map.java.opts （0.8倍） mapreduce.map.memory.mb ). 这些值绑定到ec2示例类型，通常是固定值（请参阅aws emr task config）。
通过（暂时）将值加倍，我们能够使pig脚本成功。
为具有默认值的m3.xlarge核心示例设置了以下值：
mapreduce.map.java.opts:=-xmx1152m
mapreduce.map.memory.mb:=1440
清管器启动命令

pig -Dmapreduce.map.java.opts=-Xmx2304m \
    -Dmapreduce.map.memory.mb=2880 -stop_on_failure -x tez ... script.pig

编辑
一位同事提出了以下想法：
另一个解决方法 OutOfMemory: GC overhead limit exceeded 可能是添加显式 STORE 以及 LOAD 对于有问题的关系语句，这将使tez将数据刷新到存储器中。这也有助于调试问题，因为（临时、中间）数据可以通过其他pig脚本观察到。

赞(0）回复(0）举报 2021-06-24

我来回答

pig脚本偶尔会出现outofmemoryexception失败

1条答案

相关问题

热门标签

最新问答