我是hadoop新手。我有一个问题需要计算这些文件中特定数量字母的字数-例如,4个字母的字数,5个字母的字数等等。如果一个单词在课文中重复了20次,就要逐个数20次。我试过以下方法:
Map器类
public static class Map extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
final static IntWritable one = new IntWritable(1);
IntWritable wordLength = new IntWritable();
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
wordLength.set(tokenizer.nextToken().length());
context.write(wordLength, one);
}
}
}
减速器等级
public class Reduce extends Reducer<IntWritable, IntWritable, Text, IntWritable>{
IntWritable tin = new IntWritable();
IntWritable smal = new IntWritable();
IntWritable bi = new IntWritable();
int t, s, b;
Text tiny = new Text("tiny");
Text small = new Text("small");
Text big = new Text("big");
@Override
public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{
for (IntWritable val:values){
if(key.get() == 4){
t += val.get();
}
else if(key.get()== 10){
s += val.get();
}
else if(10<=key.get()){
b += val.get();
}
}
tin.set(t);
smal.set(s);
bi.set(b);
context.write(tiny, tin);
context.write(small, smal);
context.write(big, bi);
}
}
在终端中运行时,我得到以下错误:error:java.io.ioexception:type mismatch in key from map:expected org.apache.hadoop.io.text,received org.apache.hadoop.io.intwriteable
我要做的是将键值对Map为,并减少它。这样我就可以得到长度为10,4的单词和长度最长的单词的输出。
我不确定我的方法是否正确,请求帮助解决
暂无答案!
目前还没有任何答案,快来回答吧!