eclipse—从java应用程序执行pig脚本时获取“java.lang.outofmemoryerror:java堆空间”

kyxcudwk  于 2021-06-21  发布在  Pig
关注(0)|答案(3)|浏览(356)

我有一个JavaSwing应用程序。它在内部生成pig脚本,将xml数据转换为结构化数据。我已经用java编写了生成pig脚本的逻辑。
我的xml文件在hdfs中。当我选择大小为56mb(156636条记录)的xml文件时,我的应用程序运行良好。但一旦我选择了xml文件,它就会抛出 java.lang.Exception: java.lang.OutOfMemoryError: Java heap space .
eclipse控制台错误代码段:

16/03/04 20:52:47 INFO mapReduceLayer.PigRecordReader: Current split being processed hdfs://localhost:54310/user/hduser/hadoopqatstool/input/xml/Books_WS_MM_2.xml:0+109899360
16/03/04 20:52:47 INFO data.SchemaTupleBackend: Key [pig.schematuple] was not set... will not generate code.
16/03/04 20:52:47 INFO mapReduceLayer.PigMapOnly$Map: Aliases being processed per job phase (AliasName[line,offset]): M: xmldata1[1,11],xmldata2[-1,-1],xmldata3[3,11],xmldata4[4,11],xmldata5[5,11],xmldata6[6,11],xmldata8[8,11],xmldata7[7,11],null[-1,-1] C:  R: 
16/03/04 20:52:49 INFO util.SpillableMemoryManager: first memory handler call - Collection threshold init = 85983232(83968K) used = 857675192(837573K) committed = 1172832256(1145344K) max = 1380974592(1348608K)
16/03/04 20:52:50 INFO util.SpillableMemoryManager: first memory handler call- Usage threshold init = 85983232(83968K) used = 1013180544(989434K) committed = 1172832256(1145344K) max = 1380974592(1348608K)
16/03/04 20:52:57 INFO mapred.LocalJobRunner: map task executor complete.
16/03/04 20:52:57 WARN mapred.LocalJobRunner: job_local1196112634_0001
java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.OutOfMemoryError: Java heap space
    at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
    at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:777)
    at org.apache.hadoop.io.Text.encode(Text.java:450)
    at org.apache.hadoop.io.Text.set(Text.java:198)
    at org.apache.hadoop.io.Text.<init>(Text.java:88)
    at org.apache.pig.piggybank.storage.XMLLoader$XMLRecordReader.nextKeyValue(XMLLoader.java:207)
    at org.apache.pig.piggybank.storage.XMLLoader.getNext(XMLLoader.java:262)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
16/03/04 20:52:58 WARN mapReduceLayer.MapReduceLauncher: Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
16/03/04 20:52:58 INFO mapReduceLayer.MapReduceLauncher: job job_local1196112634_0001 has failed! Stop running all dependent jobs
16/03/04 20:52:58 INFO mapReduceLayer.MapReduceLauncher: 100% complete
16/03/04 20:52:58 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/03/04 20:52:58 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/03/04 20:52:58 ERROR mapreduce.MRPigStatsUtil: 1 map reduce job(s) failed!
16/03/04 20:52:58 INFO mapreduce.SimplePigStats: Script Statistics: 

HadoopVersion   PigVersion  UserId  StartedAt   FinishedAt  Features
2.6.0   0.15.0  hduser  2016-03-04 20:52:46 2016-03-04 20:52:58 FILTER

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_local1196112634_0001    xmldata1,xmldata2,xmldata3,xmldata4,xmldata5,xmldata6,xmldata7,xmldata8 MAP_ONLY    Message: Job failed!    /user/hduser/hadoopqatstool/output/xml/2016-03-04T20_52_46,

Input(s):
Failed to read data from "hdfs://localhost:54310/user/hduser/hadoopqatstool/input/xml/Books_WS_MM_2.xml"

Output(s):
Failed to produce result in "/user/hduser/hadoopqatstool/output/xml/2016-03-04T20_52_46"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_local1196112634_0001

16/03/04 20:52:58 INFO mapReduceLayer.MapReduceLauncher: Failed!

为了解决这个问题,我搜索了hadoop-env.sh文件,发现增加hadoop\u堆的大小可以解决这个问题。所以我做了相应的改变。
hadoop-env.sh的一部分(更改前)


# The maximum amount of heap to use, in MB. Default is 1000.

# export HADOOP_HEAPSIZE=

# export HADOOP_NAMENODE_INIT_HEAPSIZE=""

export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"
export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"

我将hadoop-env.sh更改为this(更改后)


# The maximum amount of heap to use, in MB. Default is 1000.

export HADOOP_HEAPSIZE=4096
export HADOOP_NAMENODE_INIT_HEAPSIZE="4096"

export HADOOP_PORTMAP_OPTS="-Xmx4096m $HADOOP_PORTMAP_OPTS"
export HADOOP_CLIENT_OPTS="-Xmx4096m $HADOOP_CLIENT_OPTS"

但是在做了这些改变之后,我也得到了同样的内存错误。
我的xml数据的一个记录是这样的。像怀斯一样,我有多张唱片

<book id="bk101">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description>
   </book>

我的系统详细信息:
伪分布式单节点hadoop物理机。
hadoop版本:apache hadoop 2.6.0
清管器版本:0.15.0
内存:8gb
操作系统:ubuntu 14.04 LTS
64位

bpzcxfmw

bpzcxfmw1#

检查是否为Map器和缩减器(mapreduce.map.memory.mb、mapreduce.reduce.memory.mb)分配了比java更多的内存。

2uluyalo

2uluyalo2#

当您更改hadoop-env.sh文件时。现在可以更改conf/mapred-site.xml
mapred.child.java.opts=-xmx4096m
然后重新启动hadoop。
裁判:http://allthingshadoop.com/2010/04/28/map-reduce-tips-tricks-your-first-real-cluster/

7cjasjjr

7cjasjjr3#

很可能超出了运行mapper/reducer的java进程的内存量。有很多内存设置,你可以调整过去这一点。来吧-
以下属性允许您指定要传递给运行任务的JVM的选项。这些可以与-xmx一起使用来控制可用的堆。

mapreduce.map.java.opts

 mapreduce.reduce.java.opts

注意,对于第一个,没有直接的hadoop2等价物;源代码中的建议是使用另外两个。mapred.child.java.opts仍受支持(但会被其他两个更具体的设置(如果存在)覆盖)。除此之外,以下选项还允许您限制任务可用的总内存(可能是虚拟内存),包括堆、堆栈和类定义:

mapreduce.map.memory.mb  

mapreduce.reduce.memory.mb

我建议将-xmx设置为memory.mb值的75%。在yarn集群中,作业使用的内存不得超过服务器端config yarn.scheduler.maximum-allocation-mb,否则它们将被终止。要检查它们的默认值和优先级,请参阅hadoop源代码中的jobconf和mrjobconfig。
请记住,mapred-site.xml可能会为这些设置提供默认值。这可能会让人困惑—例如,如果您的作业以编程方式设置mapred.child.java.opts,那么如果mapred-site.xml设置mapreduce.map.java.opts或mapreduce.reduce.java.opts,这将不起作用。您需要在作业中设置这些属性,以覆盖mapred-site.xml。检查作业的配置页面(搜索“xmx”)以查看应用了哪些值以及它们来自何处。应用程序主内存
在yarn集群中,您可以使用以下两个属性来控制applicationmaster可用的内存量(以保存输入拆分、任务状态等的详细信息):

yarn.app.mapreduce.am.command-opts
 yarn.app.mapreduce.am.resource.mb

同样,您可以将-xmx(在前者中)设置为resource.mb值的75%。其他配置
如果在mapoutputcopier.shuffleinmemory中遇到outofmemoryerror,则将此值设置为低值(10),以强制在磁盘上执行随机播放

相关问题