缺少resourceprofileid

fsi0uk1n  于 2021-07-12  发布在  Spark
关注(0)|答案(2)|浏览(428)

我正在尝试从spark 3.0.1升级到3.1.1。
我正在jupyter笔记本上以客户机模式运行pyspark 3.1.1。
以下在3.0.1上运行,但在升级spark后失败:

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession

def get_spark_session(app_name: str, conf: SparkConf):
    conf.setMaster("k8s://https://kubernetes.default.svc.cluster.local")
    conf \
      .set("spark.kubernetes.namespace", "spark-ml") \
      .set("spark.kubernetes.container.image", "itayb/spark:3.1.1-hadoop-3.2.0-python-3.8.6-aws") \
      .set("spark.kubernetes.authenticate.driver.serviceAccountName", "spark-executor") \
      .set("spark.executor.instances", "2") \
      .set("spark.executor.memory", "2g") \
      .set("spark.executor.cores", "2") \
      .set("spark.driver.memory", "1G") \
      .set("spark.driver.port", "2222") \
      .set("spark.driver.blockManager.port", "7777") \
      .set("spark.driver.host", "jupyter.recs.svc.cluster.local") \
      .set("spark.driver.bindAddress", "0.0.0.0") \
      .set("spark.ui.port", "4040") \
      .set("spark.network.timeout", "240") \
      .set("spark.hadoop.fs.s3a.endpoint", "localstack.kube-system.svc.cluster.local:4566") \
      .set("spark.hadoop.fs.s3a.connection.ssl.enabled", "false") \
      .set("spark.hadoop.fs.s3a.path.style.access", "true") \
      .set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \
      .set("spark.hadoop.com.amazonaws.services.s3.enableV4", "true") \
      .set("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider")
    return SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()

spark = get_spark_session("aws_localstack", SparkConf())
try:
    df = spark.read.csv('s3a://my-bucket/stocks.csv',header=True)
    df.printSchema()
    print(df.count())
except Exception as exp:
    print(exp)

spark.stop()

执行者日志:

+ SPARK_CLASSPATH=':/opt/spark/jars/*'                                                                                                                                    │
+ env                                                                                                                                                                     │
+ grep SPARK_JAVA_OPT_                                                                                                                                                    │
+ sort -t_ -k4 -n                                                                                                                                                         │
+ sed 's/[^=]*=\(.*\)/\1/g'                                                                                                                                               │
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS                                                                                                                                   │
+ '[' -n '' ']'                                                                                                                                                           │
+ '[' -z ']'                                                                                                                                                              │
+ '[' -z ']'                                                                                                                                                              │
+ '[' -n '' ']'                                                                                                                                                           │
+ '[' -z ']'                                                                                                                                                              │
+ '[' -z x ']'                                                                                                                                                            │
+ SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'                                                                                                                    │
+ case "$1" in                                                                                                                                                            │
+ shift 1                                                                                                                                                                 │
+ CMD=(${JAVA_HOME}/bin/java "${SPARK_EXECUTOR_JAVA_OPTS[@]}" -Xms$SPARK_EXECUTOR_MEMORY -Xmx$SPARK_EXECUTOR_MEMORY -cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH" org.apac │
+ exec /usr/bin/tini -s -- /usr/local/openjdk-11/bin/java -Dspark.network.timeout=240 -Dspark.driver.port=2222 -Dspark.ui.port=4040 -Dspark.driver.blockManager.port=7777 │
Unrecognized options: --resourceProfileId                                                                                                                                 │
Usage: org.apache.spark.executor.CoarseGrainedExecutorBackend [options]                                                                                                   │
 Options are:                                                                                                                                                             │
   --driver-url <driverUrl>                                                                                                                                               │
   --executor-id <executorId>                                                                                                                                             │
   --bind-address <bindAddress>                                                                                                                                           │
   --hostname <hostname>                                                                                                                                                  │
   --cores <cores>                                                                                                                                                        │
   --resourcesFile <fileWithJSONResourceInformation>                                                                                                                      │
   --app-id <appid>                                                                                                                                                       │
   --worker-url <workerUrl>                                                                                                                                               │
   --user-class-path <url>                                                                                                                                                │
   --resourceProfileId <id>                                                                                                                                               │
stream closed

好像 --resourceProfileId 没有价值,但我不知道为什么。

ejk8hzay

ejk8hzay1#

是我的错:(
我在驱动程序(3.1.1)和执行程序(3.0.1)之间使用了不同的spark版本。
这个错误的发生是因为我试图从官方回购中建立遗嘱执行人的形象,只需向银行结账即可 v3.1.1 标记并使用

./bin/docker-image-tool.sh

我想我以前必须编译什么的,因为版本比较旧。我通过 exec 到工作容器并运行:

spark-submit --version

快速解决方案是下载正确的编译版本(而不是使用源代码),提取并构建:

wget https://archive.apache.org/dist/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz
tar xvzf spark-3.1.1-bin-hadoop3.2.tgz
cd spark-3.1.1-bin-hadoop3.2.tgz
./bin/docker-image-tool.sh -u root -r itayb -t 3.1.1-hadoop-3.2.0 -p kubernetes/dockerfiles/spark/bindings/python/Dockerfile build
vfwfrxfs

vfwfrxfs2#

请参见spark公关页面上的讨论-https://issues.apache.org/jira/browse/spark-33288?focusedcommentid=17296060&page=com.atlassian.jira.plugin.system.issuetabpanels%3acomment-tabpanel#comment-17296060

相关问题