我在用 Apache Spark
并尝试将数据卸载到 AWS S3
处理完之后。类似于: data.write().parquet("s3a://" + bucketName + "/" + location);
配置似乎不错:
String region = System.getenv("AWS_REGION");
String accessKeyId = System.getenv("AWS_ACCESS_KEY_ID");
String secretAccessKey = System.getenv("AWS_SECRET_ACCESS_KEY");
spark.sparkContext().hadoopConfiguration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem");
spark.sparkContext().hadoopConfiguration().set("fs.s3a.awsRegion", region);
spark.sparkContext().hadoopConfiguration().set("fs.s3a.awsAccessKeyId", accessKeyId);
spark.sparkContext().hadoopConfiguration().set("fs.s3a.awsSecretAccessKey", secretAccessKey);
``` `%HADOOP_HOME%` 与spark(v2.6.5)使用的版本完全相同,并添加到路径中:
C:>hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar run a jar file
checknative [-a|-h] check native hadoop and compression libraries availability
distcp copy file or directories recursively
archive -archiveName NAME -p * create a hadoop archive
classpath prints the class path needed to get the
Hadoop jar and the required libraries
credential interact with credential providers
key manage keys via the KeyProvider
daemonlog get/set the log level for each daemon
or
CLASSNAME run the class named CLASSNAME
我也是 `Maven` :
1条答案
按热度按时间f2uvfpb91#
是的,我错过了一步。写这个:https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.4/bin 至
%HADOOP_HOME%\bin
. 即使版本不匹配(v2.6.5和v2.6.4),这似乎仍然有效。