amazonemr:使用来自s3的输入和输出运行自定义jar

5f0d552i  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(383)

我正在尝试运行一个具有自定义jar步骤的emr集群。程序从s3获取输入并输出到s3(或者至少这是我想要完成的)。在步骤配置中,参数字段中包含以下内容:

  1. v3.MaxTemperatureDriver
  2. s3n://hadoopbook/ncdc/all
  3. s3n://hadoop-szhu/max-temp

哪里 hadoopbook/ncdc/all 是包含输入数据的bucket的路径(顺便说一句,我运行的示例来自本书),以及 hadoop-szhu 是我自己的存储桶,我想在其中存储输出。在这篇文章之后,我的mapreduce驱动程序如下所示:

  1. package v3;
  2. import org.apache.hadoop.conf.Configured;
  3. import org.apache.hadoop.fs.Path;
  4. import org.apache.hadoop.io.IntWritable;
  5. import org.apache.hadoop.io.Text;
  6. import org.apache.hadoop.mapreduce.Job;
  7. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
  8. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
  9. import org.apache.hadoop.util.Tool;
  10. import org.apache.hadoop.util.ToolRunner;
  11. import v1.MaxTemperatureReducer;
  12. public class MaxTemperatureDriver extends Configured implements Tool {
  13. @Override
  14. public int run(String[] args) throws Exception {
  15. if (args.length != 2) {
  16. System.err.printf("Usage: %s [generic options] <input> <output>\n",
  17. getClass().getSimpleName());
  18. ToolRunner.printGenericCommandUsage(System.err);
  19. return -1;
  20. }
  21. Job job = new Job(getConf(), "Max temperature");
  22. job.setJarByClass(getClass());
  23. FileInputFormat.addInputPath(job, new Path(args[0]));
  24. FileOutputFormat.setOutputPath(job, new Path(args[1]));
  25. job.setMapperClass(MaxTemperatureMapper.class);
  26. job.setCombinerClass(MaxTemperatureReducer.class);
  27. job.setReducerClass(MaxTemperatureReducer.class);
  28. job.setOutputKeyClass(Text.class);
  29. job.setOutputValueClass(IntWritable.class);
  30. return job.waitForCompletion(true) ? 0 : 1;
  31. }
  32. public static void main(String[] args) throws Exception {
  33. int exitCode = ToolRunner.run(new MaxTemperatureDriver(), args);
  34. System.exit(exitCode);
  35. }
  36. }

但是,当我尝试运行此操作时,出现以下错误:

  1. Exception in thread "main" java.io.IOException: No FileSystem for scheme: s3n

我还尝试使用以下方法将数据从s3复制到集群(在sshing到主节点之后运行):

  1. hadoop distcp \
  2. -Dfs.s3n.awsAccessKeyId='...' \
  3. -Dfs.s3n.awsSecretAccessKey='...' \
  4. s3n://hadoopbook/ncdc/all input/ncdc/all

但我有很多错误,我在下面摘录了一段:

  1. 2016-09-03 07:07:11,858 FATAL [IPC Server handler 6 on 43495] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1472884232220_0001_m_000000_0 - exited : java.io.IOException: org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: java.io.FileNotFoundException: No such file or directory 's3n://hadoopbook/ncdc/all/1901.gz'
  2. at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:224)
  3. at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
  4. at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
  5. at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:796)
  6. at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
  7. at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
  8. at java.security.AccessController.doPrivileged(Native Method)
  9. at javax.security.auth.Subject.doAs(Subject.java:422)
  10. at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
  11. at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
  12. Caused by: org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: java.io.FileNotFoundException: No such file or directory 's3n://hadoopbook/ncdc/all/1901.gz'
  13. ... 10 more
  14. Caused by: java.io.FileNotFoundException: No such file or directory 's3n://hadoopbook/ncdc/all/1901.gz'
  15. at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:818)
  16. at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:511)
  17. at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:219)
  18. ... 9 more

我不知道问题出在哪里,但我很乐意提供更多细节(请在下面评论)。谢谢!

9fkzdhlc

9fkzdhlc1#

s3n:// 是旧的协议,您应该使用 s3:// 参考文献:http://docs.aws.amazon.com//elasticmapreduce/latest/managementguide/emr-plan-file-systems.html

相关问题