我们已经用图1.5-debian10创建了gcpdataproc集群。群集创建成功。
spark版本是 2.4.7
并在图像上方的dataproc集群上运行。
dataproc群集软件配置:
群集属性:
softwareConfig:
imageVersion: 1.5.23-debian10
properties:
capacity-scheduler:yarn.scheduler.capacity.maximum-am-resource-percent: '0.9'
capacity-scheduler:yarn.scheduler.capacity.resource-calculator: org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
capacity-scheduler:yarn.scheduler.capacity.root.default.ordering-policy: fair
core:fs.gs.block.size: '134217728'
core:fs.gs.metadata.cache.enable: 'false'
core:hadoop.ssl.enabled.protocols: TLSv1,TLSv1.1,TLSv1.2
dataproc:dataproc.conscrypt.provider.enable: 'false'
dataproc:job.history.to-gcs.enabled: 'true'
mapred:mapreduce.jobhistory.done-dir: gs://</bucketName>/mapreduce/mapreduce-job-history/done
mapred:mapreduce.jobhistory.intermediate-done-dir: gs://<bucketName>/mapreduce-job-history/intermediate-don
spark:spark.eventLog.dir: gs://<bucketname>/spark-job-history/events
spark:spark.executor.cores: '8'
spark:spark.executor.instances: '2'
spark:spark.executor.memory: 8379m
spark:spark.executorEnv.OPENBLAS_NUM_THREADS: '1'
spark:spark.history.fs.logDirectory: gs://<bucketname>/spark-job-history
spark:spark.scheduler.mode: FAIR
spark:spark.sql.cbo.enabled: 'true'
设置的变通方法 spark.eventLog.enabled false
,在spark submit命令中,作业可以正常工作。但我们以后将无法访问spark history server来分析日志。
错误:
java.io.IOException: Error accessing gs://<bucketname>/spark-job-history/events
at com.google.cloud.hadoop.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1959)
at com.google.cloud.hadoop.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFileInfo(GoogleCloudStorageFileSystem.java:1083)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getFileStatus(GoogleHadoopFileSystemBase.java:1079)
at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:97)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:523)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:930)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
... 25 more
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:456)
at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:323)
at sun.security.validator.Validator.validate(Validator.java:271)
sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:638)
... 45 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:451)```
暂无答案!
目前还没有任何答案,快来回答吧!