错误尝试使用端口54321作为流代理,但端口已被占用

w51jfk4q  于 2021-05-26  发布在  Spark
关注(0)|答案(0)|浏览(239)

我有一个spark应用程序,依赖于h2o起泡水罐

// H2O Sparkling Water version (spark 2.4)
val h2oVersion = "3.32.0.1-2-2.4"
lazy val h2oLibs = Seq(
  "org.apache.spark" %% "spark-repl" % sparkVersion,
  "ai.h2o" % "sparkling-water-package_2.11" % h2oVersion //exclude("ai.h2o", "sparkling-water-api-generation")
)

当我尝试在k8s上启动它时,spark executor会正确启动(请参阅下面executor容器中的日志),我可以打开到它的端口转发连接并访问flow ui。

12-03 17:04:55.129 10.1.0.22:54321       33     ent-loop-0  INFO water.default: ----- H2O started  -----
12-03 17:04:55.129 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Build git branch: rel-zermelo
12-03 17:04:55.129 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Build git hash: 8a289cc0c15718842afc54dca7c8ed104aec9bd4
12-03 17:04:55.129 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Build git describe: jenkins-master-5215-21-g8a289cc
12-03 17:04:55.130 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Build project version: 3.32.0.1
12-03 17:04:55.130 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Build age: 1 month and 24 days
12-03 17:04:55.130 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Built by: 'jenkins'
12-03 17:04:55.130 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Built on: '2020-10-08 18:16:09'
12-03 17:04:55.130 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Found H2O Core extensions: [HiveTableImporter, StackTraceCollector, HiveFrameSaver, XGBoost]
12-03 17:04:55.131 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Processed H2O arguments: [-internal_security_conf_rel_paths, -name, sparkling-water-4444_spark-application-1607015075714, -port_offset, 1, -nthreads, 1, -hdfs_config, /opt/c3/./hdfs_conf2654621877693077305.xml, -log_level, INFO, -embedded, -baseport, 54321, -log_dir, h2ologs, -ip, 10.1.0.22, -flatfile, /var/data/spark-40ac5b73-d4ec-40ff-9785-1827934ec0ee/spark-b85b822f-e7b3-4ba4-aeb8-b7ddfaa99589/sparkling-water-019f1019-ebca-4ba0-bcf2-92078b2dcbec/flatfile.txt]
12-03 17:04:55.131 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Java availableProcessors: 1
12-03 17:04:55.131 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Java heap totalMemory: 2.00 GB
12-03 17:04:55.131 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Java heap maxMemory: 2.00 GB
12-03 17:04:55.131 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Java version: Java 1.8.0_265 (from Oracle Corporation)
12-03 17:04:55.132 10.1.0.22:54321       33     ent-loop-0  INFO water.default: JVM launch parameters: [-XX:+UseG1GC, -XX:+PrintFlagsFinal, -XX:+UseContainerSupport, -XX:+UnlockExperimentalVMOptions, -XX:+UseCGroupMemoryLimitForHeap, -Xms2G, -Xmx2G]
12-03 17:04:55.132 10.1.0.22:54321       33     ent-loop-0  INFO water.default: JVM process id: 33@vad-1607015075951-exec-1
12-03 17:04:55.132 10.1.0.22:54321       33     ent-loop-0  INFO water.default: OS version: Linux 4.19.76-linuxkit (amd64)
12-03 17:04:55.132 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Machine physical memory: 7.78 GB
12-03 17:04:55.132 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Machine locale: en
12-03 17:04:55.132 10.1.0.22:54321       33     ent-loop-0  INFO water.default: X-h2o-cluster-id: 1607015093048
12-03 17:04:55.132 10.1.0.22:54321       33     ent-loop-0  INFO water.default: User name: '4444'
12-03 17:04:55.132 10.1.0.22:54321       33     ent-loop-0  INFO water.default: IPv6 stack selected: false
12-03 17:04:55.133 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Possible IP Address: eth0 (eth0), 10.1.0.22
12-03 17:04:55.133 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Possible IP Address: lo (lo), 127.0.0.1
12-03 17:04:55.133 10.1.0.22:54321       33     ent-loop-0  INFO water.default: H2O node running in unencrypted mode.
12-03 17:04:55.137 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Internal communication uses port: 54322
12-03 17:04:55.138 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Listening for HTTP and REST traffic on http://10.1.0.22:54321/
12-03 17:04:55.146 10.1.0.22:54321       33     ent-loop-0  WARN water.default: Flatfile configuration does not include self: /10.1.0.22:54321, but contains []
12-03 17:04:55.147 10.1.0.22:54321       33     ent-loop-0  INFO water.default: H2O cloud name: 'sparkling-water-4444_spark-application-1607015075714' on /10.1.0.22:54321, discovery address /235.92.149.120:60252
12-03 17:04:55.147 10.1.0.22:54321       33     ent-loop-0  INFO water.default: If you have trouble connecting, try SSH tunneling from your local machine (e.g., via port 55555):
12-03 17:04:55.147 10.1.0.22:54321       33     ent-loop-0  INFO water.default:   1. Open a terminal and run 'ssh -L 55555:localhost:54321 4444@10.1.0.22'
12-03 17:04:55.147 10.1.0.22:54321       33     ent-loop-0  INFO water.default:   2. Point your browser to http://localhost:55555
12-03 17:04:57.038 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Log dir: 'h2ologs'
12-03 17:04:57.039 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Cur dir: '/opt/c3'
12-03 17:04:57.051 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Distributed HTTP import not available (import from HTTP/HTTPS will be eager)
12-03 17:04:57.055 10.1.0.22:54321       33     ent-loop-0  INFO water.default: HDFS subsystem successfully initialized
12-03 17:04:57.061 10.1.0.22:54321       33     ent-loop-0  INFO water.default: S3 subsystem successfully initialized
12-03 17:04:57.115 10.1.0.22:54321       33     ent-loop-0  INFO water.default: GCS subsystem successfully initialized
12-03 17:04:57.115 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Flow dir: '/opt/spark/h2oflows'
12-03 17:04:57.131 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Cloud of size 1 formed [vad-1607015075951-exec-1/10.1.0.22:54321]
12-03 17:04:57.214 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Registered parsers: [GUESS, ARFF, XLS, SVMLight, AVRO, PARQUET, ORC, CSV]
12-03 17:04:57.215 10.1.0.22:54321       33     ent-loop-0  INFO water.default: HiveTableImporter extension initialized
12-03 17:04:57.215 10.1.0.22:54321       33     ent-loop-0  INFO water.default: StackTraceCollector extension initialized
12-03 17:04:57.216 10.1.0.22:54321       33     ent-loop-0  INFO water.default: HiveFrameSaver extension initialized
12-03 17:04:57.216 10.1.0.22:54321       33     ent-loop-0  INFO water.default: XGBoost extension initialized
12-03 17:04:57.216 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Registered 4 core extensions in: 1447ms
12-03 17:04:57.217 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Registered H2O core extensions: [HiveTableImporter, StackTraceCollector, HiveFrameSaver, XGBoost]
12-03 17:04:58.139 10.1.0.22:54321       33     ent-loop-0  INFO hex.tree.xgboost.XGBoostExtension: Found XGBoost backend with library: xgboost4j_minimal
12-03 17:04:58.140 10.1.0.22:54321       33     ent-loop-0  WARN hex.tree.xgboost.XGBoostExtension: Your system supports only minimal version of XGBoost (no GPUs, no multithreading)!
12-03 17:04:58.438 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Registered: 225 REST APIs in: 1221ms
12-03 17:04:58.439 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Registered REST API extensions: [XGBoost, Algos, Amazon S3, Sparkling Water REST API Extensions, AutoML, Core V3, TargetEncoder, Core V4]
12-03 17:04:58.751 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Registered: 301 schemas in 311ms
12-03 17:04:58.751 10.1.0.22:54321       33     ent-loop-0  INFO water.default: H2O started in 5736ms
12-03 17:04:58.751 10.1.0.22:54321       33     ent-loop-0  INFO water.default: 
12-03 17:04:58.813 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Open H2O Flow in your web browser: http://10.1.0.22:54321
12-03 17:04:58.814 10.1.0.22:54321       33     ent-loop-0  INFO water.default: 
12-03 17:04:58.834 10.1.0.22:54321       33     ent-loop-1  INFO water.default: Adding vad-1607015075951-exec-1/10.1.0.22:54321 to vad-1607015075951-exec-1/10.1.0.22:54321's flatfile
12-03 17:04:58.843 10.1.0.22:54321       33     ent-loop-1  INFO water.default: Full flatfile: vad-1607015075951-exec-1/10.1.0.22:54321
12-03 17:04:58.849 10.1.0.22:54321       33     ent-loop-0  INFO water.default: Locking cloud to new members, because Locking the cloud from Sparkling Water as we have reached the expected cluster size.
12-03 17:04:59.114 10.1.0.22:54321       33     8925448-40  INFO water.default: POST /3/CloudLock, parms: {reason=Locked from Sparkling Water.}
12-03 17:04:59.434 10.1.0.22:54321       33     8925448-36  INFO water.default: GET /3/verifyWebOpen, parms: {}
12-03 17:04:59.535 10.1.0.22:54321       33     8925448-39  INFO water.default: GET /3/verifyVersion, parms: {referenced_version=3.32.0.1}
12-03 17:04:59.645 10.1.0.22:54321       33     8925448-42  INFO water.default: GET /3/LogLevel, parms: {}
12-03 17:04:59.719 10.1.0.22:54321       33     8925448-38  INFO water.default: POST /99/Rapids, parms: {ast=(setTimeZone "UTC")}

但是驱动程序节点被卡在无限循环中,我在下面的日志中看到

2020-12-03T17:04:59.532+0000 level=INFO thread=main logger=ai.h2o.sparkling.backend.utils.RestApiUtils
H2O node http://10.1.0.22:54321/3/verifyWebOpen successfully responded for the GET.
2020-12-03T17:04:59.553+0000 level=INFO thread=main logger=ai.h2o.sparkling.backend.utils.RestApiUtils
H2O node http://10.1.0.22:54321/3/verifyVersion?referenced_version=3.32.0.1 successfully responded for the GET.
2020-12-03T17:04:59.636+0000 level=INFO thread=main logger=ai.h2o.sparkling.backend.utils.RestApiUtils
H2O node http://10.1.0.22:54321/3/Cloud successfully responded for the GET.
2020-12-03T17:04:59.713+0000 level=INFO thread=main logger=ai.h2o.sparkling.backend.utils.RestApiUtils
H2O node http://10.1.0.22:54321/3/LogLevel successfully responded for the GET.
2020-12-03T17:04:59.951+0000 level=INFO thread=main logger=ai.h2o.sparkling.backend.utils.RestApiUtils
H2O node http://10.1.0.22:54321/99/Rapids successfully responded for the POST.
2020-12-03T17:05:00.011+0000 level=WARN thread=main logger=ai.h2o.sparkling.backend.utils.ProxyStarter
Tried using port 54321 for Flow proxy, but port was already occupied!
2020-12-03T17:05:00.011+0000 level=WARN thread=main logger=ai.h2o.sparkling.backend.utils.ProxyStarter
Tried using port 54322 for Flow proxy, but port was already occupied!
2020-12-03T17:05:00.011+0000 level=WARN thread=main logger=ai.h2o.sparkling.backend.utils.ProxyStarter
Tried using port 54323 for Flow proxy, but port was already occupied!
2020-12-03T17:05:00.012+0000 level=WARN thread=main logger=ai.h2o.sparkling.backend.utils.ProxyStarter
Tried using port 54324 for Flow proxy, but port was already occupied!
...

它一直在尝试!这些是驱动程序容器的开放端口(54321、54322)

Containers:
  exmac-container:
    Container ID:  docker://5889a4bba9576e4c4b6a10d4f5e570bf15ed9afdc7b4202c8423a1243d09177b
    Image:         myimage
    Image ID:      docker-pullable://amyimage@sha256:xyz
    Ports:         10011/TCP, 10013/TCP, 9091/TCP, 54321/TCP, 54322/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Args:

在这种情况下我该怎么办?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题