任务0.0在阶段13.0(tid 13)中出现异常java.lang.outofmemoryerror:java堆空间

sauutmhj  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(314)

当我们使用“mahout”操作时,我们正在试验问题。我们有一个包含100k行和100个项目的输入矩阵,进程抛出了一个关于“Task0.0中的exception in stage 13.0(tid 13)java.lang.outofmemoryerror:java heap space”的异常,我们尝试增加java heap memory、mahout heap memory和spark.driver.memory。
环境版本:mahout:0.11.1 spark:1.6.0。
mahout命令行:

  1. /opt/mahout/bin/mahout spark-rowsimilarity -i 50k_rows__50items.dat -o test_output.tmp --maxObservations 500 --maxSimilaritiesPerRow 100 --omitStrength --master local --sparkExecutorMem 8g

此进程在具有以下规格的计算机上运行:

  1. Mem RAM: 8gb
  2. CPU with 8 cores

.profile文件:

  1. export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
  2. export HADOOP_HOME=/opt/hadoop-2.6.0
  3. export SPARK_HOME=/opt/spark
  4. export MAHOUT_HOME=/opt/mahout
  5. export MAHOUT_HEAPSIZE=8192

引发异常:

  1. 16/01/22 11:45:06 ERROR Executor: Exception in task 0.0 in stage 13.0 (TID 13)
  2. java.lang.OutOfMemoryError: Java heap space
  3. at org.apache.mahout.math.DenseMatrix.<init>(DenseMatrix.java:66)
  4. at org.apache.mahout.sparkbindings.drm.package$$anonfun$blockify$1.apply(package.scala:70)
  5. at org.apache.mahout.sparkbindings.drm.package$$anonfun$blockify$1.apply(package.scala:59)
  6. at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
  7. at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
  8. at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  9. at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
  10. at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
  11. at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  12. at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
  13. at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
  14. at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  15. at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
  16. at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
  17. at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  18. at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
  19. at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
  20. at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  21. at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
  22. at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
  23. at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
  24. at org.apache.spark.scheduler.Task.run(Task.scala:89)
  25. at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
  26. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  27. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  28. at java.lang.Thread.run(Thread.java:745)
  29. 16/01/22 11:45:06 WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(driver,[Lscala.Tuple2;@12498227,BlockManagerId(driver, localhost, 42107))] in 1 attempts
  30. org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout
  31. at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
  32. at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
  33. at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
  34. at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
  35. at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
  36. at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
  37. at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
  38. at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:448)
  39. at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:468)
  40. at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468)
  41. at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468)
  42. at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741)
  43. at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:468)
  44. at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  45. at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
  46. at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
  47. at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
  48. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  49. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  50. at java.lang.Thread.run(Thread.java:745)
  51. 16/01/22 11:45:06 WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(driver,[Lscala.Tuple2;@12498227,BlockManagerId(driver, localhost, 42107))] in 1 attempts
  52. org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout
  53. at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
  54. at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
  55. at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
  56. at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
  57. at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
  58. at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
  59. at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
  60. at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:448)
  61. at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:468)
  62. at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468)
  63. at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468)
  64. at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741)
  65. at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:468)
  66. at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  67. at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
  68. at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
  69. at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
  70. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  71. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  72. at java.lang.Thread.run(Thread.java:745)
  73. Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
  74. at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
  75. at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
  76. at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
  77. at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
  78. at scala.concurrent.Await$.result(package.scala:107)
  79. at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
  80. ...

你能给个建议吗?
谢谢你的预付款。干杯。

djmepvbi

djmepvbi1#

我遇到了一个类似的问题,通过还原此提交解决了此问题:
https://github.com/apache/mahout/pull/10/commits/162c5ca36e00af91a9599075332c577d9b1a13c4

相关问题