mapreduce用于显示从a-z开始的所有单词

pkmbmrz7  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(368)

我试着让所有的单词从字母a到z开始。reduce函数的输出如下: key="alphabet", value="list of words against alphabet + their count" 我使用下面的代码,但它只显示词频,而不是单词列表。

  1. import java.io.IOException;
  2. import java.util.*;
  3. import org.apache.hadoop.fs.Path;
  4. import org.apache.hadoop.conf.*;
  5. import org.apache.hadoop.io.*;
  6. import org.apache.hadoop.mapred.*;
  7. import org.apache.hadoop.util.*;
  8. public class WordCountFrequency {
  9. public static class WordCountFrequencyMap extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
  10. private final static IntWritable one = new IntWritable(1);
  11. private Text word = new Text();
  12. public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
  13. String line = value.toString();
  14. StringTokenizer tokenizer = new StringTokenizer(line);
  15. while (tokenizer.hasMoreTokens()) {
  16. String token=tokenizer.nextToken();
  17. if (token.startsWith("A")) {
  18. word.set("A_Count");
  19. output.collect(word, one);
  20. } else if (token.startsWith("B")) {
  21. word.set("B_Count");
  22. output.collect(word, one);
  23. }
  24. }//end of while
  25. }
  26. }
  27. public static class WordCountFrequencyReduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
  28. public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
  29. int sum = 0;
  30. while (values.hasNext()) {
  31. sum += values.next().get();
  32. }
  33. output.collect(key, new IntWritable(sum));
  34. }
  35. }
  36. public static void main(String[] args) throws Exception {
  37. JobConf conf = new JobConf(WordCountFrequency.class);
  38. conf.setJobName("WordCountFrequency");
  39. conf.setOutputKeyClass(Text.class);
  40. conf.setOutputValueClass(IntWritable.class);
  41. conf.setMapperClass(WordCountFrequencyMap.class);
  42. conf.setCombinerClass(WordCountFrequencyReduce.class);
  43. conf.setReducerClass(WordCountFrequencyReduce.class);
  44. conf.setInputFormat(TextInputFormat.class);
  45. conf.setOutputFormat(TextOutputFormat.class);
  46. FileInputFormat.setInputPaths(conf, new Path(args[0]));
  47. FileOutputFormat.setOutputPath(conf, new Path(args[1]));
  48. JobClient.runJob(conf);
  49. }
  50. }

我想这样显示输出: "Alphabet, 'list of words', word counts" ```
A: Apple, Ant, And, Add, Axis, 5[wordcount]
B: Ball, Bat, Boy, Bus, 4
....
Z: Zebra, Zinc, Zeal ,3

  1. 如何像上面提到的那样显示输出。
u3r8eeie

u3r8eeie1#

下面是解决方案的伪代码

  1. map(LongWritable key, Text value) {
  2. for each token in value:
  3. output.collect(token.charAt(0), token)
  4. }
  5. reduce (Text letter, Iterable<Text> words) {
  6. String result = "";
  7. int count = 0;
  8. for (Text word : words) {
  9. result += word.get()+", ";
  10. count++;
  11. }
  12. output.collect(letter, new Text(result+count));
  13. }

相关问题