spark内存/工作区问题&正确的spark配置是什么?

guykilcj  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(397)

我的spark群集中总共有6个节点。5个节点各有4个核和32gb ram,其中一个节点(节点4)有8个核和32gb ram。
所以我总共有6个节点-28个核,192gb内存(我想用一半的内存,但所有的核)
计划在集群上运行5个spark应用程序。
我的spark\u defaults.conf如下:

  1. spark.master spark://***:7077
  2. spark.eventLog.enabled false
  3. spark.driver.memory 2g
  4. worker_max_heapsize 2g
  5. spark.kryoserializer.buffer.max.mb 128
  6. spark.shuffle.file.buffer.kb 1024
  7. spark.cores.max 4
  8. spark.dynamicAllocation.enabled true

我希望通过设置以下配置,在每个节点上使用16gbmax,在每台机器上运行4个worker示例。因此,我的集群上需要(4个示例*6个节点=24个)工人。它们总共将使用多达28个内核(all)和96gbram。
我的spark-env.sh如下。

  1. export SPARK_WORKER_MEMORY=16g
  2. export SPARK_WORKER_INSTANCES=4
  3. SPARK_LOCAL_DIRS=/app/spark/spark-1.6.1-bin-hadoop2.6/local
  4. SPARK_WORKER_DIR=/app/spark/spark-1.6.1-bin-hadoop2.6/work

但我的星火团已经开始了
spark ui正在显示正在运行的工作人员。。

  1. Worker Id ? Address State Cores Memory
  2. worker-node4-address ALIVE 8 (1 Used) 16.0 GB (0.0 GB Used)
  3. worker-node4-address ALIVE 8 (1 Used) 16.0 GB (0.0 GB Used)
  4. worker-node4-address ALIVE 8 (1 Used) 16.0 GB (0.0 GB Used)
  5. worker-node4-address ALIVE 8 (0 Used) 16.0 GB (0.0 B Used)
  6. worker-node4-address ALIVE 8 (1 Used) 16.0 GB (0.0 GB Used)
  7. worker-node1-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
  8. worker-node1-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
  9. worker-node1-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
  10. worker-node1-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
  11. worker-node2-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
  12. worker-node2-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
  13. worker-node2-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
  14. worker-node2-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
  15. worker-node3-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
  16. worker-node3-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
  17. worker-node3-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
  18. worker-node3-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
  19. worker-node5-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
  20. worker-node5-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
  21. worker-node5-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
  22. worker-node5-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
  23. worker-node6-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
  24. worker-node6-address ALIVE 4 (3 Used) 16.0 GB (0.0 GB Used)
  25. worker-node6-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
  26. worker-node6-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)

但是主ui显示(当没有应用程序运行时)活动的worker:25个使用中的内核:总共120个,0个使用中的内存:总共400.0gb,0gb使用状态:活动
当我预期有24个工人(每个节点4个)时,为什么有25个工人节点4有8个核,1是额外的。
当我为每个节点分配了16gb的最大使用空间时,为什么它会显示内存正在使用:总共400.0GB?
ui数据显示我有120个核心,而我的集群上有28个核心?
你能告诉我我的系统应该有什么样的Spark配置吗。?
提交spark作业时应指定多少核心执行器内存?
什么是spark.cores.max参数?是每个节点还是整个集群?
我用spart submit配置运行了3个应用程序--executor memory 2g--total executor cores 4至少我的一个应用程序给出了以下错误和失败。

  1. Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread
  2. at java.lang.Thread.start0(Native Method)
  3. at java.lang.Thread.start(Thread.java:714)
  4. at scala.concurrent.forkjoin.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1672)
  5. at scala.concurrent.forkjoin.ForkJoinPool.signalWork(ForkJoinPool.java:1966)
  6. at scala.concurrent.forkjoin.ForkJoinPool.fullExternalPush(ForkJoinPool.java:1905)
  7. at scala.concurrent.forkjoin.ForkJoinPool.externalPush(ForkJoinPool.java:1834)
  8. at scala.concurrent.forkjoin.ForkJoinPool.execute(ForkJoinPool.java:2955)
  9. at scala.concurrent.impl.ExecutionContextImpl.execute(ExecutionContextImpl.scala:120)
  10. at scala.concurrent.impl.Future$.apply(Future.scala:31)
  11. at scala.concurrent.Future$.apply(Future.scala:485)
  12. at org.apache.spark.deploy.rest.RestSubmissionClient.readResponse(RestSubmissionClient.scala:232)
  13. at org.apache.spark.deploy.rest.RestSubmissionClient.org$apache$spark$deploy$rest$RestSubmissionClient$$postJson(RestSubmissionClient.scala:222)
  14. at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$createSubmission$3.apply(RestSubmissionClient.scala:87)
  15. at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$createSubmission$3.apply(RestSubmissionClient.scala:83)
  16. at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
  17. at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  18. at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
  19. at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
  20. at org.apache.spark.deploy.rest.RestSubmissionClient.createSubmission(RestSubmissionClient.scala:83)
  21. at org.apache.spark.deploy.rest.RestSubmissionClient$.run(RestSubmissionClient.scala:411)
  22. at org.apache.spark.deploy.rest.RestSubmissionClient$.main(RestSubmissionClient.scala:424)
  23. at org.apache.spark.deploy.rest.RestSubmissionClient.main(RestSubmissionClient.scala)
  24. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  25. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  26. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  27. at java.lang.reflect.Method.invoke(Method.java:497)
  28. at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
  29. at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
  30. at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:195)
  31. at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
  32. at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
lp0sw83n

lp0sw83n1#

据我所知,每个节点只能启动一个工作进程:
http://spark.apache.org/docs/latest/hardware-provisioning.html
仅当每个节点的ram超过200 gb时。但每个节点没有200 gb ram。你能在spark-env.sh中的只有4个核的节点上设置这个吗?

  1. export SPARK_EXECUTOR_CORES=4
  2. export SPARK_EXECUTOR_MEMORY=16GB
  3. export SPARK_MASTER_HOST=<Your Master-Ip here>

在这个有8个核心的节点上:

  1. export SPARK_EXECUTOR_CORES=8
  2. export SPARK_EXECUTOR_MEMORY=16GB
  3. export SPARK_MASTER_HOST=<Your Master-Ip here>

在spark-defaults.conf的主节点上:

  1. spark.driver.memory 2g

我想你应该试试这个,然后把其他的数字注解出来作为测试。这就是你想要的吗?您的集群现在应该总共使用96 gb和28个核心。你可以启动你的应用程序,然后没有 --executor-memory 2G --total-executor-cores 4 . 但是一个 java.lang.OutOfMemoryError 可以在没有错误配置的情况下发生。当你向司机收取太多的钱时,也会发生这种情况。
是的,每个worker在当前配置中都有16gbram。那么25个工人*16 gb=总共400 gb。

展开查看全部

相关问题