eclipse—从java应用程序执行pig脚本时获取“java.lang.outofmemoryerror:java堆空间”

kyxcudwk  于 2021-06-21  发布在  Pig
关注(0)|答案(3)|浏览(418)

我有一个JavaSwing应用程序。它在内部生成pig脚本,将xml数据转换为结构化数据。我已经用java编写了生成pig脚本的逻辑。
我的xml文件在hdfs中。当我选择大小为56mb(156636条记录)的xml文件时,我的应用程序运行良好。但一旦我选择了xml文件,它就会抛出 java.lang.Exception: java.lang.OutOfMemoryError: Java heap space .
eclipse控制台错误代码段:

  1. 16/03/04 20:52:47 INFO mapReduceLayer.PigRecordReader: Current split being processed hdfs://localhost:54310/user/hduser/hadoopqatstool/input/xml/Books_WS_MM_2.xml:0+109899360
  2. 16/03/04 20:52:47 INFO data.SchemaTupleBackend: Key [pig.schematuple] was not set... will not generate code.
  3. 16/03/04 20:52:47 INFO mapReduceLayer.PigMapOnly$Map: Aliases being processed per job phase (AliasName[line,offset]): M: xmldata1[1,11],xmldata2[-1,-1],xmldata3[3,11],xmldata4[4,11],xmldata5[5,11],xmldata6[6,11],xmldata8[8,11],xmldata7[7,11],null[-1,-1] C: R:
  4. 16/03/04 20:52:49 INFO util.SpillableMemoryManager: first memory handler call - Collection threshold init = 85983232(83968K) used = 857675192(837573K) committed = 1172832256(1145344K) max = 1380974592(1348608K)
  5. 16/03/04 20:52:50 INFO util.SpillableMemoryManager: first memory handler call- Usage threshold init = 85983232(83968K) used = 1013180544(989434K) committed = 1172832256(1145344K) max = 1380974592(1348608K)
  6. 16/03/04 20:52:57 INFO mapred.LocalJobRunner: map task executor complete.
  7. 16/03/04 20:52:57 WARN mapred.LocalJobRunner: job_local1196112634_0001
  8. java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
  9. at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
  10. at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
  11. Caused by: java.lang.OutOfMemoryError: Java heap space
  12. at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
  13. at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
  14. at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:777)
  15. at org.apache.hadoop.io.Text.encode(Text.java:450)
  16. at org.apache.hadoop.io.Text.set(Text.java:198)
  17. at org.apache.hadoop.io.Text.<init>(Text.java:88)
  18. at org.apache.pig.piggybank.storage.XMLLoader$XMLRecordReader.nextKeyValue(XMLLoader.java:207)
  19. at org.apache.pig.piggybank.storage.XMLLoader.getNext(XMLLoader.java:262)
  20. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
  21. at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
  22. at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
  23. at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
  24. at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
  25. at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
  26. at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
  27. at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
  28. at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  29. at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  30. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  31. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  32. at java.lang.Thread.run(Thread.java:745)
  33. 16/03/04 20:52:58 WARN mapReduceLayer.MapReduceLauncher: Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
  34. 16/03/04 20:52:58 INFO mapReduceLayer.MapReduceLauncher: job job_local1196112634_0001 has failed! Stop running all dependent jobs
  35. 16/03/04 20:52:58 INFO mapReduceLayer.MapReduceLauncher: 100% complete
  36. 16/03/04 20:52:58 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
  37. 16/03/04 20:52:58 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
  38. 16/03/04 20:52:58 ERROR mapreduce.MRPigStatsUtil: 1 map reduce job(s) failed!
  39. 16/03/04 20:52:58 INFO mapreduce.SimplePigStats: Script Statistics:
  40. HadoopVersion PigVersion UserId StartedAt FinishedAt Features
  41. 2.6.0 0.15.0 hduser 2016-03-04 20:52:46 2016-03-04 20:52:58 FILTER
  42. Failed!
  43. Failed Jobs:
  44. JobId Alias Feature Message Outputs
  45. job_local1196112634_0001 xmldata1,xmldata2,xmldata3,xmldata4,xmldata5,xmldata6,xmldata7,xmldata8 MAP_ONLY Message: Job failed! /user/hduser/hadoopqatstool/output/xml/2016-03-04T20_52_46,
  46. Input(s):
  47. Failed to read data from "hdfs://localhost:54310/user/hduser/hadoopqatstool/input/xml/Books_WS_MM_2.xml"
  48. Output(s):
  49. Failed to produce result in "/user/hduser/hadoopqatstool/output/xml/2016-03-04T20_52_46"
  50. Counters:
  51. Total records written : 0
  52. Total bytes written : 0
  53. Spillable Memory Manager spill count : 0
  54. Total bags proactively spilled: 0
  55. Total records proactively spilled: 0
  56. Job DAG:
  57. job_local1196112634_0001
  58. 16/03/04 20:52:58 INFO mapReduceLayer.MapReduceLauncher: Failed!

为了解决这个问题,我搜索了hadoop-env.sh文件,发现增加hadoop\u堆的大小可以解决这个问题。所以我做了相应的改变。
hadoop-env.sh的一部分(更改前)

  1. # The maximum amount of heap to use, in MB. Default is 1000.
  2. # export HADOOP_HEAPSIZE=
  3. # export HADOOP_NAMENODE_INIT_HEAPSIZE=""
  4. export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"
  5. export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"

我将hadoop-env.sh更改为this(更改后)

  1. # The maximum amount of heap to use, in MB. Default is 1000.
  2. export HADOOP_HEAPSIZE=4096
  3. export HADOOP_NAMENODE_INIT_HEAPSIZE="4096"
  4. export HADOOP_PORTMAP_OPTS="-Xmx4096m $HADOOP_PORTMAP_OPTS"
  5. export HADOOP_CLIENT_OPTS="-Xmx4096m $HADOOP_CLIENT_OPTS"

但是在做了这些改变之后,我也得到了同样的内存错误。
我的xml数据的一个记录是这样的。像怀斯一样,我有多张唱片

  1. <book id="bk101">
  2. <author>Ralls, Kim</author>
  3. <title>Midnight Rain</title>
  4. <genre>Fantasy</genre>
  5. <price>5.95</price>
  6. <publish_date>2000-12-16</publish_date>
  7. <description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description>
  8. </book>

我的系统详细信息:
伪分布式单节点hadoop物理机。
hadoop版本:apache hadoop 2.6.0
清管器版本:0.15.0
内存:8gb
操作系统:ubuntu 14.04 LTS
64位

bpzcxfmw

bpzcxfmw1#

检查是否为Map器和缩减器(mapreduce.map.memory.mb、mapreduce.reduce.memory.mb)分配了比java更多的内存。

2uluyalo

2uluyalo2#

当您更改hadoop-env.sh文件时。现在可以更改conf/mapred-site.xml
mapred.child.java.opts=-xmx4096m
然后重新启动hadoop。
裁判:http://allthingshadoop.com/2010/04/28/map-reduce-tips-tricks-your-first-real-cluster/

7cjasjjr

7cjasjjr3#

很可能超出了运行mapper/reducer的java进程的内存量。有很多内存设置,你可以调整过去这一点。来吧-
以下属性允许您指定要传递给运行任务的JVM的选项。这些可以与-xmx一起使用来控制可用的堆。

  1. mapreduce.map.java.opts
  2. mapreduce.reduce.java.opts

注意,对于第一个,没有直接的hadoop2等价物;源代码中的建议是使用另外两个。mapred.child.java.opts仍受支持(但会被其他两个更具体的设置(如果存在)覆盖)。除此之外,以下选项还允许您限制任务可用的总内存(可能是虚拟内存),包括堆、堆栈和类定义:

  1. mapreduce.map.memory.mb
  2. mapreduce.reduce.memory.mb

我建议将-xmx设置为memory.mb值的75%。在yarn集群中,作业使用的内存不得超过服务器端config yarn.scheduler.maximum-allocation-mb,否则它们将被终止。要检查它们的默认值和优先级,请参阅hadoop源代码中的jobconf和mrjobconfig。
请记住,mapred-site.xml可能会为这些设置提供默认值。这可能会让人困惑—例如,如果您的作业以编程方式设置mapred.child.java.opts,那么如果mapred-site.xml设置mapreduce.map.java.opts或mapreduce.reduce.java.opts,这将不起作用。您需要在作业中设置这些属性,以覆盖mapred-site.xml。检查作业的配置页面(搜索“xmx”)以查看应用了哪些值以及它们来自何处。应用程序主内存
在yarn集群中,您可以使用以下两个属性来控制applicationmaster可用的内存量(以保存输入拆分、任务状态等的详细信息):

  1. yarn.app.mapreduce.am.command-opts
  2. yarn.app.mapreduce.am.resource.mb

同样,您可以将-xmx(在前者中)设置为resource.mb值的75%。其他配置
如果在mapoutputcopier.shuffleinmemory中遇到outofmemoryerror,则将此值设置为低值(10),以强制在磁盘上执行随机播放

展开查看全部

相关问题