尝试全部求和

xoefb8l8 于 2021-06-04 发布在 Hadoop

关注(0)|答案(1)|浏览(302)

我正在尝试调整这里的wordcount示例：http://wiki.apache.org/hadoop/wordcount 因此，它将求和并返回输入文件中的字数，而不是计算每个单词的出现次数。
我试着改变mapper类的方式，使它不会在当前迭代中编写单词，而是为所有单词编写“sum:”。
i、 e.更换

word.set(tokenizer.nextToken());

@类“map”与

word.set("Sum: ");

文件的其余部分保持不变。
以这种方式，我认为所有Map器的输出都会得到同一个缩减器，最终将“sum:”的数量相加，这最终将是文件中的字数。
意思不是：

word  1
 other 1
 other 1

这就产生了：

word  1
other 2

我本以为会有：

Sum:  1
 Sum:  1
 Sum:  1

这就产生了：

Sum: 3

相反，当我尝试运行代码时，我得到一个非常长的Map操作，最终抛出一个exeption：
runtimeexception:java.io.ioexception:溢出失败
不管输入文件有多小。
期待您的帮助。谢谢您

Java hadoop word-count

来源：https://stackoverflow.com/questions/25227879/hadoop-java-word-count-tweak-not-working-try-to-sum-all

1条答案

按热度按时间

gywdnpxw1#

你有一个无尽的循环。在你的代码里，你需要

tokenizer.nextToken()

将stringtokenizer从行中向前推进一个单词。否则，Map操作将永远不会取得进展。
所以你需要这样的东西：

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text sumText = new Text("Sum: ");
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            tokenizer.nextToken(); //go to next word
            context.write(sumText, one);
        }
    }
}

但是，没有循环还有更好的解决方案。你可以用ẗ他 countTokens() stringtokenizer的方法：

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        context.write(new Text("Sum: "), new IntWritable(tokenizer.countTokens()));
    }
}

赞(0）回复(0）举报 2021-06-04

我来回答

尝试全部求和

1条答案

相关问题

热门标签

最新问答