我将emr-6.0.0与kerberos、spark2.4.4和amazon3.2.1一起使用,只有一个主服务器。
我正在尝试使用spark submit向spark提交远程作业,无论我尝试什么,我都会不断得到下一个异常:
Exception in thread "main" org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1614359247442_0018 to YARN : Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, Service: 3.92.137.244:8020, Ident: (token for hadoop: HDFS_DELEGATION_TOKEN owner=hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY, renewer=yarn, realUser=, issueDate=1614442935775, maxDate=1615047735775, sequenceNumber=28, masterKeyId=2)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:327)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:183)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1134)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1526)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
我可以看到作业到达我的资源管理器,但一旦它到达,它将在新状态下继续等待接下来的2分钟:
21/02/27 09:24:18 INFO YarnClientImpl: Application submission is not finished, submitted application application_1614359247442_0018 is still in NEW
然后它就失败了:
Exception in thread "main" org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1614359247442_0018 to YARN : Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, Service: 3.92.137.244:8020, Ident: (token for hadoop: HDFS_DELEGATION_TOKEN owner=hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY, renewer=yarn, realUser=, issueDate=1614442935775, maxDate=1615047735775, sequenceNumber=28, masterKeyId=2)
我使用过的一些spark submit命令有:
./bin/spark-submit -v --master yarn --deploy-mode cluster --executor-memory 512MB --total-executor-cores 10 --conf spark.hadoop.fs.hdfs.impl.disable.cache=true --conf spark.ego.uname=hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY --conf spark.ego.keytab=/etc/hadoop.keytab --principal hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY --keytab /etc/hadoop.keytab $SPARK_HOME/examples/src/main/python/pi.py
我还使用了这个变体(使用--conf spark.ego.keytab):
./bin/spark-submit -v --master yarn --deploy-mode cluster --executor-memory 512MB --total-executor-cores 10 --conf spark.hadoop.fs.hdfs.impl.disable.cache=true --conf spark.ego.uname=hadoop/ip-172-31-89-107.ec2.internal@MODELOP --conf spark.ego.keytab=/etc/hadoop.keytab --principal hadoop/ip-172-31-89-107.ec2.internal@MODELOP --keytab /etc/hadoop.keytab $SPARK_HOME/examples/src/main/python/pi.py
完整执行日志:
miguelr@Miguels-MacBook-Pro-2 spark-2.4.4_emr6_kerberos % ./bin/spark-submit -v --master yarn --deploy-mode cluster --executor-memory 512MB --total-executor-cores 10 --conf spark.hadoop.fs.hdfs.impl.disable.cache=true --conf spark.ego.uname=hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY --conf spark.ego.keytab=/etc/hadoop.keytab --principal hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY --keytab /etc/hadoop.keytab $SPARK_HOME/examples/src/main/python/pi.py
Using properties file: /opt/sw/spark-2.4.4_emr6_kerberos/conf/spark-defaults.conf
21/02/27 09:21:57 WARN Utils: Your hostname, Miguels-MacBook-Pro-2.local resolves to a loopback address: 127.0.0.1; using 192.168.0.14 instead (on interface en0)
21/02/27 09:21:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Adding default property: spark.sql.warehouse.dir=hdfs:///user/spark/warehouse
Adding default property: spark.yarn.dist.files=/etc/spark/conf/hive-site.xml
Adding default property: spark.history.kerberos.keytab=/etc/spark.keytab
Adding default property: spark.sql.parquet.fs.optimized.committer.optimization-enabled=true
Adding default property: spark.executor.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:+ExitOnOutOfMemoryError
Adding default property: spark.history.fs.logDirectory=hdfs:///var/log/spark/apps
Adding default property: spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem=2
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.shuffle.service.enabled=true
Adding default property: spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
Adding default property: spark.yarn.historyServer.address=ip-172-31-22-222.ec2.internal:18080
Adding default property: spark.stage.attempt.ignoreOnDecommissionFetchFailure=true
Adding default property: spark.driver.memory=2048M
Adding default property: spark.files.fetchFailure.unRegisterOutputOnHost=true
Adding default property: spark.history.kerberos.principal=spark/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY
Adding default property: spark.resourceManager.cleanupExpiredHost=true
Adding default property: spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/etc/passwd:/etc/passwd:ro,/usr/lib:/docker/usr/lib:ro,/usr/share:/docker/usr/share:ro,/mnt/s3:/mnt/s3:rw,/mnt1/s3:/mnt1/s3:rw
Adding default property: spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS=$(hostname -f)
Adding default property: spark.sql.emr.internal.extensions=com.amazonaws.emr.spark.EmrSparkSessionExtensions
Adding default property: spark.driver.extraJavaOptions=-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:+ExitOnOutOfMemoryError
Adding default property: spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds=50000
Adding default property: spark.master=yarn
Adding default property: spark.sql.parquet.output.committer.class=com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter
Adding default property: spark.blacklist.decommissioning.timeout=1h
Adding default property: spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
Adding default property: spark.sql.hive.metastore.sharedPrefixes=com.amazonaws.services.dynamodbv2
Adding default property: spark.executor.memory=4743M
Adding default property: spark.driver.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/emrfs/conf:/docker/usr/share/aws/emr/emrfs/lib/*:/docker/usr/share/aws/emr/emrfs/auxlib/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar
Adding default property: spark.eventLog.dir=hdfs:///var/log/spark/apps
Adding default property: spark.dynamicAllocation.enabled=true
Adding default property: spark.executor.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/emrfs/conf:/docker/usr/share/aws/emr/emrfs/lib/*:/docker/usr/share/aws/emr/emrfs/auxlib/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar
Adding default property: spark.executor.cores=2
Adding default property: spark.history.ui.port=18080
Adding default property: spark.blacklist.decommissioning.enabled=true
Adding default property: spark.history.kerberos.enabled=true
Adding default property: spark.decommissioning.timeout.threshold=20
Adding default property: spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/etc/passwd:/etc/passwd:ro,/usr/lib:/docker/usr/lib:ro,/usr/share:/docker/usr/share:ro,/mnt/s3:/mnt/s3:rw,/mnt1/s3:/mnt1/s3:rw
Adding default property: spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem=true
Adding default property: spark.hadoop.yarn.timeline-service.enabled=false
Adding default property: spark.yarn.executor.memoryOverheadFactor=0.1875
Parsed arguments:
master yarn
deployMode cluster
executorMemory 512MB
executorCores 2
totalExecutorCores 10
propertiesFile /opt/sw/spark-2.4.4_emr6_kerberos/conf/spark-defaults.conf
driverMemory 2048M
driverCores null
driverExtraClassPath /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/emrfs/conf:/docker/usr/share/aws/emr/emrfs/lib/*:/docker/usr/share/aws/emr/emrfs/auxlib/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar
driverExtraLibraryPath /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
driverExtraJavaOptions -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:+ExitOnOutOfMemoryError
supervise false
queue null
numExecutors null
files null
pyFiles null
archives null
mainClass null
primaryResource file:/opt/sw/spark-2.4.4_emr6_kerberos/examples/src/main/python/pi.py
name pi.py
childArgs []
jars null
packages null
packagesExclusions null
repositories null
verbose true
Spark properties used, including those specified through
--conf and those from the properties file /opt/sw/spark-2.4.4_emr6_kerberos/conf/spark-defaults.conf:
(spark.sql.emr.internal.extensions,com.amazonaws.emr.spark.EmrSparkSessionExtensions)
(spark.history.kerberos.enabled,true)
(spark.blacklist.decommissioning.timeout,1h)
(spark.yarn.executor.memoryOverheadFactor,0.1875)
(spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native)
(spark.history.kerberos.principal,spark/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY)
(spark.blacklist.decommissioning.enabled,true)
(spark.hadoop.fs.hdfs.impl.disable.cache,true)
(spark.hadoop.yarn.timeline-service.enabled,false)
(spark.driver.memory,2048M)
(spark.executor.memory,4743M)
(spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native)
(spark.sql.parquet.fs.optimized.committer.optimization-enabled,true)
(spark.sql.warehouse.dir,hdfs:///user/spark/warehouse)
(spark.yarn.historyServer.address,ip-172-31-22-222.ec2.internal:18080)
(spark.ego.uname,hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY)
(spark.eventLog.enabled,true)
(spark.yarn.dist.files,/etc/spark/conf/hive-site.xml)
(spark.files.fetchFailure.unRegisterOutputOnHost,true)
(spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS,/etc/passwd:/etc/passwd:ro,/usr/lib:/docker/usr/lib:ro,/usr/share:/docker/usr/share:ro,/mnt/s3:/mnt/s3:rw,/mnt1/s3:/mnt1/s3:rw)
(spark.history.ui.port,18080)
(spark.stage.attempt.ignoreOnDecommissionFetchFailure,true)
(spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds,50000)
(spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS,$(hostname -f))
(spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS,/etc/passwd:/etc/passwd:ro,/usr/lib:/docker/usr/lib:ro,/usr/share:/docker/usr/share:ro,/mnt/s3:/mnt/s3:rw,/mnt1/s3:/mnt1/s3:rw)
(spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:+ExitOnOutOfMemoryError)
(spark.resourceManager.cleanupExpiredHost,true)
(spark.history.fs.logDirectory,hdfs:///var/log/spark/apps)
(spark.shuffle.service.enabled,true)
(spark.driver.extraJavaOptions,-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:+ExitOnOutOfMemoryError)
(spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem,2)
(spark.history.kerberos.keytab,/etc/spark.keytab)
(spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2)
(spark.eventLog.dir,hdfs:///var/log/spark/apps)
(spark.master,yarn)
(spark.dynamicAllocation.enabled,true)
(spark.sql.parquet.output.committer.class,com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter)
(spark.ego.keytab,/etc/hadoop.keytab)
(spark.executor.cores,2)
(spark.decommissioning.timeout.threshold,20)
(spark.driver.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/emrfs/conf:/docker/usr/share/aws/emr/emrfs/lib/*:/docker/usr/share/aws/emr/emrfs/auxlib/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar)
(spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem,true)
21/02/27 09:21:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Main class:
org.apache.spark.deploy.yarn.YarnClusterApplication
Arguments:
--primary-py-file
file:/opt/sw/spark-2.4.4_emr6_kerberos/examples/src/main/python/pi.py
--class
org.apache.spark.deploy.PythonRunner
Spark config:
(spark.yarn.keytab,/etc/hadoop.keytab)
(spark.sql.warehouse.dir,hdfs:///user/spark/warehouse)
(spark.yarn.dist.files,file:/etc/spark/conf/hive-site.xml)
(spark.history.kerberos.keytab,/etc/spark.keytab)
(spark.sql.parquet.fs.optimized.committer.optimization-enabled,true)
(spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:+ExitOnOutOfMemoryError)
(spark.history.fs.logDirectory,hdfs:///var/log/spark/apps)
(spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem,2)
(spark.eventLog.enabled,true)
(spark.shuffle.service.enabled,true)
(spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native)
(spark.hadoop.fs.hdfs.impl.disable.cache,true)
(spark.yarn.historyServer.address,ip-172-31-22-222.ec2.internal:18080)
(spark.stage.attempt.ignoreOnDecommissionFetchFailure,true)
(spark.app.name,pi.py)
(spark.yarn.principal,hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY)
(spark.driver.memory,2048M)
(spark.files.fetchFailure.unRegisterOutputOnHost,true)
(spark.history.kerberos.principal,spark/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY)
(spark.ego.keytab,/etc/hadoop.keytab)
(spark.resourceManager.cleanupExpiredHost,true)
(spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS,/etc/passwd:/etc/passwd:ro,/usr/lib:/docker/usr/lib:ro,/usr/share:/docker/usr/share:ro,/mnt/s3:/mnt/s3:rw,/mnt1/s3:/mnt1/s3:rw)
(spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS,$(hostname -f))
(spark.sql.emr.internal.extensions,com.amazonaws.emr.spark.EmrSparkSessionExtensions)
(spark.ego.uname,hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY)
(spark.driver.extraJavaOptions,-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:+ExitOnOutOfMemoryError)
(spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds,50000)
(spark.submit.deployMode,cluster)
(spark.master,yarn)
(spark.sql.parquet.output.committer.class,com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter)
(spark.blacklist.decommissioning.timeout,1h)
(spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native)
(spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2)
(spark.executor.memory,512MB)
(spark.eventLog.dir,hdfs:///var/log/spark/apps)
(spark.dynamicAllocation.enabled,true)
(spark.executor.cores,2)
(spark.history.ui.port,18080)
(spark.yarn.isPython,true)
(spark.blacklist.decommissioning.enabled,true)
(spark.history.kerberos.enabled,true)
(spark.decommissioning.timeout.threshold,20)
(spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS,/etc/passwd:/etc/passwd:ro,/usr/lib:/docker/usr/lib:ro,/usr/share:/docker/usr/share:ro,/mnt/s3:/mnt/s3:rw,/mnt1/s3:/mnt1/s3:rw)
(spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem,true)
(spark.hadoop.yarn.timeline-service.enabled,false)
(spark.yarn.executor.memoryOverheadFactor,0.1875)
Classpath elements:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/02/27 09:21:58 INFO Client: Kerberos credentials: principal = hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY, keytab = /etc/hadoop.keytab
21/02/27 09:21:58 INFO RMProxy: Connecting to ResourceManager at ip-172-31-22-222.ec2.internal/3.92.137.244:8032
21/02/27 09:21:59 INFO Client: Requesting a new application from cluster with 1 NodeManagers
21/02/27 09:21:59 INFO Configuration: resource-types.xml not found
21/02/27 09:21:59 INFO ResourceUtils: Unable to find 'resource-types.xml'.
21/02/27 09:21:59 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container)
21/02/27 09:21:59 INFO Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
21/02/27 09:21:59 INFO Client: Setting up container launch context for our AM
21/02/27 09:21:59 INFO Client: Setting up the launch environment for our AM container
21/02/27 09:21:59 INFO Client: Preparing resources for our AM container
21/02/27 09:22:00 INFO Client: To enable the AM to login from keytab, credentials are being copied over to the AM via the YARN Secure Distributed Cache.
21/02/27 09:22:00 INFO Client: Uploading resource file:/etc/hadoop.keytab -> hdfs://ip-172-31-22-222.ec2.internal:8020/user/hadoop/.sparkStaging/application_1614359247442_0018/hadoop.keytab
21/02/27 09:22:01 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
21/02/27 09:22:03 INFO Client: Uploading resource file:/private/var/folders/38/ml5dcrkd6tdbfm8tk2kqb0880000gn/T/spark-7fdb5051-41b3-4d5d-b168-a90b09682f58/__spark_libs__7065880233350683192.zip -> hdfs://ip-172-31-22-222.ec2.internal:8020/user/hadoop/.sparkStaging/application_1614359247442_0018/__spark_libs__7065880233350683192.zip
21/02/27 09:22:10 INFO Client: Uploading resource file:/etc/spark/conf/hive-site.xml -> hdfs://ip-172-31-22-222.ec2.internal:8020/user/hadoop/.sparkStaging/application_1614359247442_0018/hive-site.xml
21/02/27 09:22:11 INFO Client: Uploading resource file:/opt/sw/spark-2.4.4_emr6_kerberos/examples/src/main/python/pi.py -> hdfs://ip-172-31-22-222.ec2.internal:8020/user/hadoop/.sparkStaging/application_1614359247442_0018/pi.py
21/02/27 09:22:12 INFO Client: Uploading resource file:/opt/sw/spark-2.4.4_emr6_kerberos/python/lib/pyspark.zip -> hdfs://ip-172-31-22-222.ec2.internal:8020/user/hadoop/.sparkStaging/application_1614359247442_0018/pyspark.zip
21/02/27 09:22:13 INFO Client: Uploading resource file:/opt/sw/spark-2.4.4_emr6_kerberos/python/lib/py4j-0.10.7-src.zip -> hdfs://ip-172-31-22-222.ec2.internal:8020/user/hadoop/.sparkStaging/application_1614359247442_0018/py4j-0.10.7-src.zip
21/02/27 09:22:14 INFO Client: Uploading resource file:/private/var/folders/38/ml5dcrkd6tdbfm8tk2kqb0880000gn/T/spark-7fdb5051-41b3-4d5d-b168-a90b09682f58/__spark_conf__3455622681190730836.zip -> hdfs://ip-172-31-22-222.ec2.internal:8020/user/hadoop/.sparkStaging/application_1614359247442_0018/__spark_conf__.zip
21/02/27 09:22:15 INFO SecurityManager: Changing view acls to: miguelr,hadoop
21/02/27 09:22:15 INFO SecurityManager: Changing modify acls to: miguelr,hadoop
21/02/27 09:22:15 INFO SecurityManager: Changing view acls groups to:
21/02/27 09:22:15 INFO SecurityManager: Changing modify acls groups to:
21/02/27 09:22:15 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(miguelr, hadoop); groups with view permissions: Set(); users with modify permissions: Set(miguelr, hadoop); groups with modify permissions: Set()
21/02/27 09:22:15 INFO HadoopFSDelegationTokenProvider: getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-160461870_1, ugi=hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY (auth:KERBEROS)]]
21/02/27 09:22:15 INFO DFSClient: Created token for hadoop: HDFS_DELEGATION_TOKEN owner=hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY, renewer=yarn, realUser=, issueDate=1614442935775, maxDate=1615047735775, sequenceNumber=28, masterKeyId=2 on 3.92.137.244:8020
21/02/27 09:22:16 INFO KMSClientProvider: New token created: (Kind: kms-dt, Service: kms://http@ip-172-31-22-222.ec2.internal:9600/kms, Ident: (kms-dt owner=hadoop, renewer=yarn, realUser=, issueDate=1614442936371, maxDate=1615047736371, sequenceNumber=28, masterKeyId=2))
21/02/27 09:22:16 INFO HadoopFSDelegationTokenProvider: getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-160461870_1, ugi=hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY (auth:KERBEROS)]]
21/02/27 09:22:16 INFO DFSClient: Created token for hadoop: HDFS_DELEGATION_TOKEN owner=hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY, renewer=hadoop, realUser=, issueDate=1614442936617, maxDate=1615047736617, sequenceNumber=29, masterKeyId=2 on 3.92.137.244:8020
21/02/27 09:22:16 INFO KMSClientProvider: New token created: (Kind: kms-dt, Service: kms://http@ip-172-31-22-222.ec2.internal:9600/kms, Ident: (kms-dt owner=hadoop, renewer=hadoop, realUser=, issueDate=1614442937045, maxDate=1615047737045, sequenceNumber=29, masterKeyId=2))
21/02/27 09:22:16 INFO HadoopFSDelegationTokenProvider: Renewal interval is 86400495 for token HDFS_DELEGATION_TOKEN
21/02/27 09:22:17 INFO HadoopFSDelegationTokenProvider: Renewal interval is 86400511 for token kms-dt
21/02/27 09:22:17 WARN HiveConf: HiveConf of name hive.server2.thrift.url does not exist
21/02/27 09:22:18 INFO Client: Submitting application application_1614359247442_0018 to ResourceManager
21/02/27 09:22:21 INFO YarnClientImpl: Application submission is not finished, submitted application application_1614359247442_0018 is still in NEW
21/02/27 09:24:18 INFO YarnClientImpl: Application submission is not finished, submitted application application_1614359247442_0018 is still in NEW
21/02/27 09:24:19 INFO Client: Deleted staging directory hdfs://ip-172-31-22-222.ec2.internal:8020/user/hadoop/.sparkStaging/application_1614359247442_0018
Exception in thread "main" org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1614359247442_0018 to YARN : Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, Service: 3.92.137.244:8020, Ident: (token for hadoop: HDFS_DELEGATION_TOKEN owner=hadoop/ip-172-31-22-222.ec2.internal@OPENDATACOMPANY, renewer=yarn, realUser=, issueDate=1614442935775, maxDate=1615047735775, sequenceNumber=28, masterKeyId=2)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:327)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:183)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1134)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1526)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/02/27 09:24:19 INFO ShutdownHookManager: Shutdown hook called
21/02/27 09:24:19 INFO ShutdownHookManager: Deleting directory /private/var/folders/38/ml5dcrkd6tdbfm8tk2kqb0880000gn/T/spark-7fdb5051-41b3-4d5d-b168-a90b09682f58
21/02/27 09:24:19 INFO ShutdownHookManager: Deleting directory /private/var/folders/38/ml5dcrkd6tdbfm8tk2kqb0880000gn/T/spark-a45d84f6-f1c4-4b06-897a-70cb77aab91e
暂无答案!
目前还没有任何答案,快来回答吧!