使用hadoop将数据卸载到aws s3的apache spark错误

wtzytmuj  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(307)

我在用 Apache Spark 并尝试将数据卸载到 AWS S3 处理完之后。类似于: data.write().parquet("s3a://" + bucketName + "/" + location); 配置似乎不错:

String region = System.getenv("AWS_REGION");
        String accessKeyId = System.getenv("AWS_ACCESS_KEY_ID");
        String secretAccessKey = System.getenv("AWS_SECRET_ACCESS_KEY");

        spark.sparkContext().hadoopConfiguration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem");
        spark.sparkContext().hadoopConfiguration().set("fs.s3a.awsRegion", region);
        spark.sparkContext().hadoopConfiguration().set("fs.s3a.awsAccessKeyId", accessKeyId);
        spark.sparkContext().hadoopConfiguration().set("fs.s3a.awsSecretAccessKey", secretAccessKey);
``` `%HADOOP_HOME%` 与spark(v2.6.5)使用的版本完全相同,并添加到路径中:

C:>hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar run a jar file
checknative [-a|-h] check native hadoop and compression libraries availability
distcp copy file or directories recursively
archive -archiveName NAME -p * create a hadoop archive
classpath prints the class path needed to get the
Hadoop jar and the required libraries
credential interact with credential providers
key manage keys via the KeyProvider
daemonlog get/set the log level for each daemon
or
CLASSNAME run the class named CLASSNAME

我也是 `Maven` :
f2uvfpb9

f2uvfpb91#

是的,我错过了一步。写这个:https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.4/bin 至 %HADOOP_HOME%\bin . 即使版本不匹配(v2.6.5和v2.6.4),这似乎仍然有效。

相关问题