noclassdeffounderror:com/amazonaws/services/s3/model/multiobjectdeleteexception

nhaq1z21  于 2021-05-18  发布在  Spark
关注(0)|答案(2)|浏览(935)

我在eks上运行jupyterhub,希望利用eks irsa功能在k8s上运行spark工作负载。我以前有使用kube2iam的经验,但是现在我打算搬到irsa。
这个错误并不是因为irsa,因为服务帐户可以很好地连接到驱动程序和执行器pod,我可以通过cli和sdk从两者访问s3。这个问题与在spark3.0/hadoop3.2上使用spark访问s3有关
py4jjavaerror:调用none.org.apache.spark.api.java.javasparkcontext时出错:java.lang.noclassdeffounderror:com/amazonaws/services/s3/model/multiobjectdeleteexception
我正在使用以下版本-
apache\u spark\u版本=3.0.1
hadoop\u版本=3.2
aws-java-sdk-1.11.890标准
hadoop-aws-3.2.0版本
python 3.7.3版
我也测试了不同的版本。
aws-java-sdk-1.11.563.jar文件
如果有人遇到这个问题,请帮助给出解决方案。
ps:这也不是iam策略错误,因为iam策略非常好。

ttcibm8c

ttcibm8c1#

最后用下面的jar解决了所有问题-
hadoop-aws-3.2.0.jar
aws-java-sdk-bundle-1.11.874.jar文件(https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-bundle/1.11.874)
任何试图使用irsa在eks上运行spark的人这是正确的spark配置-

  1. from pyspark.sql import SparkSession
  2. spark = SparkSession.builder \
  3. .appName("pyspark-data-analysis-1") \
  4. .config("spark.kubernetes.driver.master","k8s://https://xxxxxx.gr7.ap-southeast-1.eks.amazonaws.com:443") \
  5. .config("spark.kubernetes.namespace", "jupyter") \
  6. .config("spark.kubernetes.container.image", "xxxxxx.dkr.ecr.ap-southeast-1.amazonaws.com/spark-ubuntu-3.0.1") \
  7. .config("spark.kubernetes.container.image.pullPolicy" ,"Always") \
  8. .config("spark.kubernetes.authenticate.driver.serviceAccountName", "spark") \
  9. .config("spark.kubernetes.authenticate.executor.serviceAccountName", "spark") \
  10. .config("spark.kubernetes.executor.annotation.eks.amazonaws.com/role-arn","arn:aws:iam::xxxxxx:role/spark-irsa") \
  11. .config("spark.hadoop.fs.s3a.aws.credentials.provider", "com.amazonaws.auth.WebIdentityTokenCredentialsProvider") \
  12. .config("spark.kubernetes.authenticate.submission.caCertFile", "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt") \
  13. .config("spark.kubernetes.authenticate.submission.oauthTokenFile", "/var/run/secrets/kubernetes.io/serviceaccount/token") \
  14. .config("spark.hadoop.fs.s3a.multiobjectdelete.enable", "false") \
  15. .config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \
  16. .config("spark.hadoop.fs.s3a.fast.upload","true") \
  17. .config("spark.executor.instances", "1") \
  18. .config("spark.executor.cores", "3") \
  19. .config("spark.executor.memory", "10g") \
  20. .getOrCreate()
展开查看全部
jogvjijk

jogvjijk2#

你能看看这个博客吗(https://medium.com/swlh/how-to-perform-a-spark-submit-to-amazon-eks-cluster-with-irsa-50af9b26cae)使用:
Spark2.4.4
hadoop 2.7.3版
aws软件开发包1.11.834
例如spark submit

  1. /opt/spark/bin/spark-submit \
  2. --master=k8s://https://4A5<i_am_tu>545E6.sk1.ap-southeast-1.eks.amazonaws.com \
  3. --deploy-mode cluster \
  4. --name spark-pi \
  5. --class org.apache.spark.examples.SparkPi \
  6. --conf spark.kubernetes.driver.pod.name=spark-pi-driver \
  7. --conf spark.kubernetes.container.image=vitamingaugau/spark:spark-2.4.4-irsa \
  8. --conf spark.kubernetes.namespace=spark-pi \
  9. --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-pi \
  10. --conf spark.kubernetes.authenticate.executor.serviceAccountName=spark-pi \
  11. --conf spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.WebIdentityTokenCredentialsProvider \
  12. --conf spark.kubernetes.authenticate.submission.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
  13. --conf spark.kubernetes.authenticate.submission.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token \
  14. local:///opt/spark/examples/target/scala-2.11/jars/spark-examples_2.11-2.4.4.jar 20000
展开查看全部

相关问题