我有一个JavaSwing应用程序。它在内部生成pig脚本,将xml数据转换为结构化数据。我已经用java编写了生成pig脚本的逻辑。
我的xml文件在hdfs中。当我选择大小为56mb(156636条记录)的xml文件时,我的应用程序运行良好。但一旦我选择了xml文件,它就会抛出 java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
.
eclipse控制台错误代码段:
16/03/04 20:52:47 INFO mapReduceLayer.PigRecordReader: Current split being processed hdfs://localhost:54310/user/hduser/hadoopqatstool/input/xml/Books_WS_MM_2.xml:0+109899360
16/03/04 20:52:47 INFO data.SchemaTupleBackend: Key [pig.schematuple] was not set... will not generate code.
16/03/04 20:52:47 INFO mapReduceLayer.PigMapOnly$Map: Aliases being processed per job phase (AliasName[line,offset]): M: xmldata1[1,11],xmldata2[-1,-1],xmldata3[3,11],xmldata4[4,11],xmldata5[5,11],xmldata6[6,11],xmldata8[8,11],xmldata7[7,11],null[-1,-1] C: R:
16/03/04 20:52:49 INFO util.SpillableMemoryManager: first memory handler call - Collection threshold init = 85983232(83968K) used = 857675192(837573K) committed = 1172832256(1145344K) max = 1380974592(1348608K)
16/03/04 20:52:50 INFO util.SpillableMemoryManager: first memory handler call- Usage threshold init = 85983232(83968K) used = 1013180544(989434K) committed = 1172832256(1145344K) max = 1380974592(1348608K)
16/03/04 20:52:57 INFO mapred.LocalJobRunner: map task executor complete.
16/03/04 20:52:57 WARN mapred.LocalJobRunner: job_local1196112634_0001
java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:777)
at org.apache.hadoop.io.Text.encode(Text.java:450)
at org.apache.hadoop.io.Text.set(Text.java:198)
at org.apache.hadoop.io.Text.<init>(Text.java:88)
at org.apache.pig.piggybank.storage.XMLLoader$XMLRecordReader.nextKeyValue(XMLLoader.java:207)
at org.apache.pig.piggybank.storage.XMLLoader.getNext(XMLLoader.java:262)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/03/04 20:52:58 WARN mapReduceLayer.MapReduceLauncher: Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
16/03/04 20:52:58 INFO mapReduceLayer.MapReduceLauncher: job job_local1196112634_0001 has failed! Stop running all dependent jobs
16/03/04 20:52:58 INFO mapReduceLayer.MapReduceLauncher: 100% complete
16/03/04 20:52:58 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/03/04 20:52:58 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/03/04 20:52:58 ERROR mapreduce.MRPigStatsUtil: 1 map reduce job(s) failed!
16/03/04 20:52:58 INFO mapreduce.SimplePigStats: Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.6.0 0.15.0 hduser 2016-03-04 20:52:46 2016-03-04 20:52:58 FILTER
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local1196112634_0001 xmldata1,xmldata2,xmldata3,xmldata4,xmldata5,xmldata6,xmldata7,xmldata8 MAP_ONLY Message: Job failed! /user/hduser/hadoopqatstool/output/xml/2016-03-04T20_52_46,
Input(s):
Failed to read data from "hdfs://localhost:54310/user/hduser/hadoopqatstool/input/xml/Books_WS_MM_2.xml"
Output(s):
Failed to produce result in "/user/hduser/hadoopqatstool/output/xml/2016-03-04T20_52_46"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local1196112634_0001
16/03/04 20:52:58 INFO mapReduceLayer.MapReduceLauncher: Failed!
为了解决这个问题,我搜索了hadoop-env.sh文件,发现增加hadoop\u堆的大小可以解决这个问题。所以我做了相应的改变。
hadoop-env.sh的一部分(更改前)
# The maximum amount of heap to use, in MB. Default is 1000.
# export HADOOP_HEAPSIZE=
# export HADOOP_NAMENODE_INIT_HEAPSIZE=""
export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"
export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
我将hadoop-env.sh更改为this(更改后)
# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=4096
export HADOOP_NAMENODE_INIT_HEAPSIZE="4096"
export HADOOP_PORTMAP_OPTS="-Xmx4096m $HADOOP_PORTMAP_OPTS"
export HADOOP_CLIENT_OPTS="-Xmx4096m $HADOOP_CLIENT_OPTS"
但是在做了这些改变之后,我也得到了同样的内存错误。
我的xml数据的一个记录是这样的。像怀斯一样,我有多张唱片
<book id="bk101">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description>
</book>
我的系统详细信息:
伪分布式单节点hadoop物理机。
hadoop版本:apache hadoop 2.6.0
清管器版本:0.15.0
内存:8gb
操作系统:ubuntu 14.04 LTS
64位
3条答案
按热度按时间bpzcxfmw1#
检查是否为Map器和缩减器(mapreduce.map.memory.mb、mapreduce.reduce.memory.mb)分配了比java更多的内存。
2uluyalo2#
当您更改hadoop-env.sh文件时。现在可以更改conf/mapred-site.xml
mapred.child.java.opts=-xmx4096m
然后重新启动hadoop。
裁判:http://allthingshadoop.com/2010/04/28/map-reduce-tips-tricks-your-first-real-cluster/
7cjasjjr3#
很可能超出了运行mapper/reducer的java进程的内存量。有很多内存设置,你可以调整过去这一点。来吧-
以下属性允许您指定要传递给运行任务的JVM的选项。这些可以与-xmx一起使用来控制可用的堆。
注意,对于第一个,没有直接的hadoop2等价物;源代码中的建议是使用另外两个。mapred.child.java.opts仍受支持(但会被其他两个更具体的设置(如果存在)覆盖)。除此之外,以下选项还允许您限制任务可用的总内存(可能是虚拟内存),包括堆、堆栈和类定义:
我建议将-xmx设置为memory.mb值的75%。在yarn集群中,作业使用的内存不得超过服务器端config yarn.scheduler.maximum-allocation-mb,否则它们将被终止。要检查它们的默认值和优先级,请参阅hadoop源代码中的jobconf和mrjobconfig。
请记住,mapred-site.xml可能会为这些设置提供默认值。这可能会让人困惑—例如,如果您的作业以编程方式设置mapred.child.java.opts,那么如果mapred-site.xml设置mapreduce.map.java.opts或mapreduce.reduce.java.opts,这将不起作用。您需要在作业中设置这些属性,以覆盖mapred-site.xml。检查作业的配置页面(搜索“xmx”)以查看应用了哪些值以及它们来自何处。应用程序主内存
在yarn集群中,您可以使用以下两个属性来控制applicationmaster可用的内存量(以保存输入拆分、任务状态等的详细信息):
同样,您可以将-xmx(在前者中)设置为resource.mb值的75%。其他配置
如果在mapoutputcopier.shuffleinmemory中遇到outofmemoryerror,则将此值设置为低值(10),以强制在磁盘上执行随机播放