编辑1:尝试使用NETWORK_MODE:工作节点上的主机,结果相同
我正在以独立配置设置Spark的多节点多坞站群集:
1个节点,带有1个Spark主机和X个工作节点
Docker-为主节点+工作节点编写:
version: '2'
services:
spark:
image: bitnami/spark:latest
environment:
- SPARK_MODE=master
ports:
- '8080:8080'
- '4040:4040'
- '7077:7077'
spark-worker:
image: bitnami/spark:latest
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark:7077
deploy:
mode: replicated
replicas: 4
具有1...M个工作进程的N个节点
Docker-为工作节点撰写:
version: '2'
services:
spark-worker:
image: bitnami/spark:latest
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://1.1.1.1:7077
network-mode: host
deploy:
mode: replicated
replicas: 4
我可以在Spark Master网络用户界面上看到注册的工人的正确数量。但当我在MASTER上提交作业时,主日志中填满了:
spark_1 | 22/07/01 13:32:27 INFO Master: Removing executor app-20220701133058-0002/499 because it is EXITED
spark_1 | 22/07/01 13:32:27 INFO Master: Launching executor app-20220701133058-0002/530 on worker worker-20220701130135-172.18.0.4-35337
spark_1 | 22/07/01 13:32:27 INFO Master: Removing executor app-20220701133058-0002/501 because it is EXITED
spark_1 | 22/07/01 13:32:27 INFO Master: Launching executor app-20220701133058-0002/531 on worker worker-20220701132457-172.18.0.5-39517
spark_1 | 22/07/01 13:32:27 INFO Master: Removing executor app-20220701133058-0002/502 because it is EXITED
spark_1 | 22/07/01 13:32:27 INFO Master: Launching executor app-20220701133058-0002/532 on worker worker-20220701132457-172.18.0.2-43527
spark_1 | 22/07/01 13:32:27 INFO Master: Removing executor app-20220701133058-0002/505 because it is EXITED
spark_1 | 22/07/01 13:32:27 INFO Master: Launching executor app-20220701133058-0002/533 on worker worker-20220701130134-172.18.0.3-35961
spark_1 | 22/07/01 13:32:27 INFO Master: Removing executor app-20220701133058-0002/504 because it is EXITED
spark_1 | 22/07/01 13:32:27 INFO Master: Launching executor app-20220701133058-0002/534 on worker worker-20220701132453-172.18.0.5-40345
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/506 because it is EXITED
spark_1 | 22/07/01 13:32:28 INFO Master: Launching executor app-20220701133058-0002/535 on worker worker-20220701132454-172.18.0.2-42907
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/514 because it is EXITED
spark_1 | 22/07/01 13:32:28 INFO Master: Launching executor app-20220701133058-0002/536 on worker worker-20220701132442-172.18.0.2-41669
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/503 because it is EXITED
spark_1 | 22/07/01 13:32:28 INFO Master: Launching executor app-20220701133058-0002/537 on worker worker-20220701132454-172.18.0.3-37011
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/509 because it is EXITED
spark_1 | 22/07/01 13:32:28 INFO Master: Launching executor app-20220701133058-0002/538 on worker worker-20220701132455-172.18.0.4-42013
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/507 because it is EXITED
spark_1 | 22/07/01 13:32:28 INFO Master: Launching executor app-20220701133058-0002/539 on worker worker-20220701132510-172.18.0.3-39097
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/508 because it is EXITED
spark_1 | 22/07/01 13:32:28 INFO Master: Launching executor app-20220701133058-0002/540 on worker worker-20220701132510-172.18.0.2-40827
spark_1 | 22/07/01 13:32:28 INFO Master: Removing executor app-20220701133058-0002/513 because it is EXITED
远程员工日志示例:
spark-worker_1 | 22/07/01 13:32:32 INFO ExecutorRunner: Launch command: "/opt/bitnami/java/bin/java" "-cp" "/opt/bitnami/spark/conf/:/opt/bitnami/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=38385" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@63ab9653f1c0:38385" "--executor-id" "561" "--hostname" "172.18.0.4" "--cores" "1" "--app-id" "app-20220701133058-0002" "--worker-url" "spark://Worker@172.18.0.4:35337"
spark-worker_1 | 22/07/01 13:32:38 INFO Worker: Executor app-20220701133058-0002/561 finished with state EXITED message Command exited with code 1 exitStatus 1
spark-worker_1 | 22/07/01 13:32:38 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 561
spark-worker_1 | 22/07/01 13:32:38 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20220701133058-0002, execId=561)
spark-worker_1 | 22/07/01 13:32:38 INFO Worker: Asked to launch executor app-20220701133058-0002/595 for API Bruteforce
spark-worker_1 | 22/07/01 13:32:38 INFO SecurityManager: Changing view acls to: spark
spark-worker_1 | 22/07/01 13:32:38 INFO SecurityManager: Changing modify acls to: spark
spark-worker_1 | 22/07/01 13:32:38 INFO SecurityManager: Changing view acls groups to:
spark-worker_1 | 22/07/01 13:32:38 INFO SecurityManager: Changing modify acls groups to:
spark-worker_1 | 22/07/01 13:32:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); groups with view permissions: Set(); users with modify permissions: Set(spark); groups with modify permissions: Set()
spark-worker_1 | 22/07/01 13:32:38 INFO ExecutorRunner: Launch command: "/opt/bitnami/java/bin/java" "-cp" "/opt/bitnami/spark/conf/:/opt/bitnami/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=38385" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@63ab9653f1c0:38385" "--executor-id" "595" "--hostname" "172.18.0.4" "--cores" "1" "--app-id" "app-20220701133058-0002" "--worker-url" "spark://Worker@172.18.0.4:35337"
spark-worker_1 | 22/07/01 13:32:43 INFO Worker: Executor app-20220701133058-0002/595 finished with state EXITED message Command exited with code 1 exitStatus 1
spark-worker_1 | 22/07/01 13:32:43 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 595
spark-worker_1 | 22/07/01 13:32:43 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20220701133058-0002, execId=595)
spark-worker_1 | 22/07/01 13:32:43 INFO Worker: Asked to launch executor app-20220701133058-0002/629 for API Bruteforce
spark-worker_1 | 22/07/01 13:32:43 INFO SecurityManager: Changing view acls to: spark
spark-worker_1 | 22/07/01 13:32:43 INFO SecurityManager: Changing modify acls to: spark
spark-worker_1 | 22/07/01 13:32:43 INFO SecurityManager: Changing view acls groups to:
spark-worker_1 | 22/07/01 13:32:43 INFO SecurityManager: Changing modify acls groups to:
spark-worker_1 | 22/07/01 13:32:43 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); groups with view permissions: Set(); users with modify permissions: Set(spark); groups with modify permissions: Set()
spark-worker_1 | 22/07/01 13:32:43 INFO ExecutorRunner: Launch command: "/opt/bitnami/java/bin/java" "-cp" "/opt/bitnami/spark/conf/:/opt/bitnami/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=38385" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@63ab9653f1c0:38385" "--executor-id" "629" "--hostname" "172.18.0.4" "--cores" "1" "--app-id" "app-20220701133058-0002" "--worker-url" "spark://Worker@172.18.0.4:35337"
吞吐量非常低,工作节点上的CPU使用率达到100%
我相信这与Worker节点上的docker端口Map有关,但我不知道需要在Worker容器上公开哪些端口?如果它们是同一个端口,我该如何为同一台机器上的多个容器配置它们呢?
1条答案
按热度按时间s6fujrry1#
我认为您应该添加docker-compose of Worker节点