mapreduce程序,用于计算具有特定字母的单词数

r8uurelv  于 2021-05-27  发布在  Hadoop
关注(0)|答案(0)|浏览(259)

我是hadoop新手。我有一个问题需要计算这些文件中特定数量字母的字数-例如,4个字母的字数,5个字母的字数等等。如果一个单词在课文中重复了20次,就要逐个数20次。我试过以下方法:
Map器类

  1. public static class Map extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
  2. final static IntWritable one = new IntWritable(1);
  3. IntWritable wordLength = new IntWritable();
  4. @Override
  5. public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
  6. {
  7. String line = value.toString();
  8. StringTokenizer tokenizer = new StringTokenizer(line);
  9. while (tokenizer.hasMoreTokens()) {
  10. wordLength.set(tokenizer.nextToken().length());
  11. context.write(wordLength, one);
  12. }
  13. }
  14. }

减速器等级

  1. public class Reduce extends Reducer<IntWritable, IntWritable, Text, IntWritable>{
  2. IntWritable tin = new IntWritable();
  3. IntWritable smal = new IntWritable();
  4. IntWritable bi = new IntWritable();
  5. int t, s, b;
  6. Text tiny = new Text("tiny");
  7. Text small = new Text("small");
  8. Text big = new Text("big");
  9. @Override
  10. public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{
  11. for (IntWritable val:values){
  12. if(key.get() == 4){
  13. t += val.get();
  14. }
  15. else if(key.get()== 10){
  16. s += val.get();
  17. }
  18. else if(10<=key.get()){
  19. b += val.get();
  20. }
  21. }
  22. tin.set(t);
  23. smal.set(s);
  24. bi.set(b);
  25. context.write(tiny, tin);
  26. context.write(small, smal);
  27. context.write(big, bi);
  28. }
  29. }

在终端中运行时,我得到以下错误:error:java.io.ioexception:type mismatch in key from map:expected org.apache.hadoop.io.text,received org.apache.hadoop.io.intwriteable
我要做的是将键值对Map为,并减少它。这样我就可以得到长度为10,4的单词和长度最长的单词的输出。
我不确定我的方法是否正确,请求帮助解决

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题