spark不从s3读/写信息(responsecode=400,responsemessage=bad request)

vddsk6oq  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(589)

我实现了spark应用程序。我创建了spark上下文:

  1. private JavaSparkContext createJavaSparkContext() {
  2. SparkConf conf = new SparkConf();
  3. conf.setAppName("test");
  4. if (conf.get("spark.master", null) == null) {
  5. conf.setMaster("local[4]");
  6. }
  7. conf.set("fs.s3a.awsAccessKeyId", getCredentialConfig().getS3Key());
  8. conf.set("fs.s3a.awsSecretAccessKey", getCredentialConfig().getS3Secret());
  9. conf.set("fs.s3a.endpoint", getCredentialConfig().getS3Endpoint());
  10. return new JavaSparkContext(conf);
  11. }

我尝试通过spark dataset api(spark sql)从s3获取数据:

  1. String s = "s3a://" + getCredentialConfig().getS3Bucket();
  2. Dataset<Row> csv = getSparkSession()
  3. .read()
  4. .option("header", "true")
  5. .csv(s + "/dataset.csv");
  6. System.out.println("Read size :" + csv.count());

出现错误:

  1. Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 1A3E8CBD4959289D, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: Q1Fv8sNvcSOWGbhJSu2d3Nfgow00388IpXiiHNKHz8vI/zysC8V8/YyQ1ILVsM2gWQIyTy1miJc=

hadoop版本:2.7
aws端点:s3.eu-central-1.amazonaws.com
(在Hadoop2.8上-一切正常)

ruoxqz4g

ruoxqz4g1#

问题是:法兰克福不支持s3n。需要使用s3a。这个区域有v4身份验证版本。http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
仅限eu(法兰克福)eu-central-1版本4
这意味着需要在aws客户端上启用它。需要添加系统属性
com.amazonaws.services.s3.enablev4->真

  1. conf.set("com.amazonaws.services.s3.enableV4", "true");//doesn't work for me

在我使用的本地计算机上:

  1. System.setProperty("com.amazonaws.services.s3.enableV4", "true");

要在aws emr上运行,需要将参数添加到spark submit:

  1. spark.executor.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true
  2. spark.driver.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true

相关问题