mapreduce作业:奇怪的输出?

sauutmhj  于 2021-05-30  发布在  Hadoop
关注(0)|答案(2)|浏览(410)

我正在写我的第一份mapreduce工作。简单的方法是:从文件中计算字母数字字符。我已经完成了生成jar文件并运行它,但是除了调试输出之外,我找不到mr作业的输出。你能帮帮我吗?
我的应用程序类:

  1. import CharacterCountMapper;
  2. import CharacterCountReducer;
  3. import org.apache.hadoop.conf.Configuration;
  4. import org.apache.hadoop.conf.Configured;
  5. import org.apache.hadoop.fs.Path;
  6. import org.apache.hadoop.io.IntWritable;
  7. import org.apache.hadoop.io.Text;
  8. import org.apache.hadoop.mapreduce.Job;
  9. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
  10. import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
  11. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
  12. import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
  13. import org.apache.hadoop.util.Tool;
  14. import org.apache.hadoop.util.ToolRunner;
  15. public class CharacterCountDriver extends Configured implements Tool {
  16. @Override
  17. public int run(String[] args) throws Exception {
  18. // Create a JobConf using the processed configuration processed by ToolRunner
  19. Job job = Job.getInstance(getConf());
  20. // Process custom command-line options
  21. Path in = new Path("/tmp/filein");
  22. Path out = new Path("/tmp/fileout");
  23. // Specify various job-specific parameters
  24. job.setJobName("Character-Count");
  25. job.setOutputKeyClass(Text.class);
  26. job.setOutputValueClass(IntWritable.class);
  27. job.setMapperClass(CharacterCountMapper.class);
  28. job.setReducerClass(CharacterCountReducer.class);
  29. job.setInputFormatClass(TextInputFormat.class);
  30. job.setOutputFormatClass(TextOutputFormat.class);
  31. FileInputFormat.setInputPaths(job, in);
  32. FileOutputFormat.setOutputPath(job, out);
  33. job.setJarByClass(CharacterCountDriver.class);
  34. job.submit();
  35. return 0;
  36. }
  37. public static void main(String[] args) throws Exception {
  38. // Let ToolRunner handle generic command-line options
  39. int res = ToolRunner.run(new Configuration(), new CharacterCountDriver(), args);
  40. System.exit(res);
  41. }
  42. }

然后我的mapper类:

  1. import java.io.IOException;
  2. import java.util.StringTokenizer;
  3. import org.apache.hadoop.io.IntWritable;
  4. import org.apache.hadoop.io.Text;
  5. import org.apache.hadoop.mapreduce.Mapper;
  6. public class CharacterCountMapper extends
  7. Mapper<Object, Text, Text, IntWritable> {
  8. private final static IntWritable one = new IntWritable(1);
  9. @Override
  10. protected void map(Object key, Text value, Context context)
  11. throws IOException, InterruptedException {
  12. String strValue = value.toString();
  13. StringTokenizer chars = new StringTokenizer(strValue.replaceAll("[^a-zA-Z0-9]", ""));
  14. while (chars.hasMoreTokens()) {
  15. context.write(new Text(chars.nextToken()), one);
  16. }
  17. }
  18. }

以及减速器:

  1. import java.io.IOException;
  2. import org.apache.hadoop.io.IntWritable;
  3. import org.apache.hadoop.io.Text;
  4. import org.apache.hadoop.mapreduce.Reducer;
  5. public class CharacterCountReducer extends
  6. Reducer<Text, IntWritable, Text, IntWritable> {
  7. @Override
  8. protected void reduce(Text key, Iterable<IntWritable> values, Context context)
  9. throws IOException, InterruptedException {
  10. int charCount = 0;
  11. for (IntWritable val: values) {
  12. charCount += val.get();
  13. }
  14. context.write(key, new IntWritable(charCount));
  15. }
  16. }

看起来不错,我从ide生成了可运行的jar文件,并按如下方式执行:

  1. $ ./hadoop jar ~/Desktop/example_MapReduce.jar no.hib.mod250.hadoop.CharacterCountDriver
  2. 14/11/27 19:36:42 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
  3. 14/11/27 19:36:42 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
  4. 14/11/27 19:36:42 INFO input.FileInputFormat: Total input paths to process : 1
  5. 14/11/27 19:36:42 INFO mapreduce.JobSubmitter: number of splits:1
  6. 14/11/27 19:36:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local316715466_0001
  7. 14/11/27 19:36:43 WARN conf.Configuration: file:/tmp/hadoop-roberto/mapred/staging/roberto316715466/.staging/job_local316715466_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
  8. 14/11/27 19:36:43 WARN conf.Configuration: file:/tmp/hadoop-roberto/mapred/staging/roberto316715466/.staging/job_local316715466_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
  9. 14/11/27 19:36:43 WARN conf.Configuration: file:/tmp/hadoop-roberto/mapred/local/localRunner/roberto/job_local316715466_0001/job_local316715466_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
  10. 14/11/27 19:36:43 WARN conf.Configuration: file:/tmp/hadoop-roberto/mapred/local/localRunner/roberto/job_local316715466_0001/job_local316715466_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
  11. 14/11/27 19:36:43 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
  12. 14/11/27 19:36:43 INFO mapred.LocalJobRunner: OutputCommitter set in config null
  13. 14/11/27 19:36:43 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
  14. 14/11/27 19:36:43 INFO mapred.LocalJobRunner: Waiting for map tasks
  15. 14/11/27 19:36:43 INFO mapred.LocalJobRunner: Starting task: attempt_local316715466_0001_m_000000_0
  16. 14/11/27 19:36:43 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
  17. 14/11/27 19:36:43 INFO mapred.MapTask: Processing split: file:/tmp/filein:0+434
  18. 14/11/27 19:36:43 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer

那么我猜我的输出文件将在/tmp/fileout中。但事实上,它似乎是空的:

  1. $ tree /tmp/fileout/
  2. /tmp/fileout/
  3. └── _temporary
  4. └── 0
  5. 2 directories, 0 files

有什么我不知道的吗?有人能帮我吗?
问候:-)
编辑:
我差一点在另一个帖子上找到了解决办法。
在charactercountdriver中,我将job.submit()替换为job.waitforcompletion(true)。我得到了更详细的输出:

  1. /tmp/fileout/
  2. ├── part-r-00000
  3. └── _SUCCESS
  4. 0 directories, 2 files

但我还是不知道该怎么读,“成功是空的,r-0000部分不是我所期望的:

  1. Absorbantandyellowandporousishe 1
  2. AreyoureadykidsAyeAyeCaptain 1
  3. ICanthearyouAYEAYECAPTAIN 1
  4. Ifnauticalnonsensebesomethingyouwish 1
  5. Ohh 1
  6. READY 1
  7. SPONGEBOBSQUAREPANTS 1
  8. SpongebobSquarepants 3
  9. Spongebobsquarepants 4
  10. Thendroponthedeckandfloplikeafish 1
  11. Wholivesinapineappleunderthesea 1

有什么建议吗?我的密码有错吗?谢谢。

bgibtngc

bgibtngc1#

如果我理解正确,您希望您的程序计算输入文件中的字母数字字符。但是,这不是您的代码所做的。您可以更改Map器以计算每行中的字母数字字符:

  1. String strValue = value.toString();
  2. strValue.replaceAll("[^a-zA-Z0-9]", "");
  3. context.write(new Text("alphanumeric", strValue.length());

这将修复你的程序。基本上,Map器输出每行中的字母数字字符作为键。减速机累加每个键的计数。我的零钱,你只用一个键:“字母数字”。关键可能是别的东西,它仍然有效。

2ic8powd

2ic8powd2#

part-r-00000是减速器输出文件的名称。如果你有更多的减速机,他们将被编号为r-00001,以此类推。

相关问题