noclassdeffounderror:com/amazonaws/services/s3/model/multiobjectdeleteexception

nhaq1z21  于 2021-05-18  发布在  Spark
关注(0)|答案(2)|浏览(800)

我在eks上运行jupyterhub,希望利用eks irsa功能在k8s上运行spark工作负载。我以前有使用kube2iam的经验,但是现在我打算搬到irsa。
这个错误并不是因为irsa,因为服务帐户可以很好地连接到驱动程序和执行器pod,我可以通过cli和sdk从两者访问s3。这个问题与在spark3.0/hadoop3.2上使用spark访问s3有关
py4jjavaerror:调用none.org.apache.spark.api.java.javasparkcontext时出错:java.lang.noclassdeffounderror:com/amazonaws/services/s3/model/multiobjectdeleteexception
我正在使用以下版本-
apache\u spark\u版本=3.0.1
hadoop\u版本=3.2
aws-java-sdk-1.11.890标准
hadoop-aws-3.2.0版本
python 3.7.3版
我也测试了不同的版本。
aws-java-sdk-1.11.563.jar文件
如果有人遇到这个问题,请帮助给出解决方案。
ps:这也不是iam策略错误,因为iam策略非常好。

ttcibm8c

ttcibm8c1#

最后用下面的jar解决了所有问题-
hadoop-aws-3.2.0.jar
aws-java-sdk-bundle-1.11.874.jar文件(https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-bundle/1.11.874)
任何试图使用irsa在eks上运行spark的人这是正确的spark配置-

from pyspark.sql import SparkSession

spark = SparkSession.builder \
        .appName("pyspark-data-analysis-1") \
        .config("spark.kubernetes.driver.master","k8s://https://xxxxxx.gr7.ap-southeast-1.eks.amazonaws.com:443") \
        .config("spark.kubernetes.namespace", "jupyter") \
        .config("spark.kubernetes.container.image", "xxxxxx.dkr.ecr.ap-southeast-1.amazonaws.com/spark-ubuntu-3.0.1") \
        .config("spark.kubernetes.container.image.pullPolicy" ,"Always") \
        .config("spark.kubernetes.authenticate.driver.serviceAccountName", "spark") \
        .config("spark.kubernetes.authenticate.executor.serviceAccountName", "spark") \
        .config("spark.kubernetes.executor.annotation.eks.amazonaws.com/role-arn","arn:aws:iam::xxxxxx:role/spark-irsa") \
        .config("spark.hadoop.fs.s3a.aws.credentials.provider", "com.amazonaws.auth.WebIdentityTokenCredentialsProvider") \
        .config("spark.kubernetes.authenticate.submission.caCertFile", "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt") \
        .config("spark.kubernetes.authenticate.submission.oauthTokenFile", "/var/run/secrets/kubernetes.io/serviceaccount/token") \
        .config("spark.hadoop.fs.s3a.multiobjectdelete.enable", "false") \
        .config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \
        .config("spark.hadoop.fs.s3a.fast.upload","true") \
        .config("spark.executor.instances", "1") \
        .config("spark.executor.cores", "3") \
        .config("spark.executor.memory", "10g") \
        .getOrCreate()
jogvjijk

jogvjijk2#

你能看看这个博客吗(https://medium.com/swlh/how-to-perform-a-spark-submit-to-amazon-eks-cluster-with-irsa-50af9b26cae)使用:
Spark2.4.4
hadoop 2.7.3版
aws软件开发包1.11.834
例如spark submit

/opt/spark/bin/spark-submit \
    --master=k8s://https://4A5<i_am_tu>545E6.sk1.ap-southeast-1.eks.amazonaws.com \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.kubernetes.driver.pod.name=spark-pi-driver \
    --conf spark.kubernetes.container.image=vitamingaugau/spark:spark-2.4.4-irsa \
    --conf spark.kubernetes.namespace=spark-pi \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-pi \
    --conf spark.kubernetes.authenticate.executor.serviceAccountName=spark-pi \
    --conf spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.WebIdentityTokenCredentialsProvider \
    --conf spark.kubernetes.authenticate.submission.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
    --conf spark.kubernetes.authenticate.submission.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token \
    local:///opt/spark/examples/target/scala-2.11/jars/spark-examples_2.11-2.4.4.jar 20000

相关问题