outputcollector是如何工作的？

4jb9z9bj 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(340)

我试着分析默认的map reduce作业，它没有定义mapper或reducer。i、一个使用identitymapper和identityreducer来让自己明白我只是写了我的identity reducer

public static class MyIdentityReducer extends MapReduceBase implements Reducer<Text,Text,Text,Text> {
        @Override
        public void reduce(Text key, Iterator<Text> values,
                OutputCollector<Text, Text> output, Reporter reporter)
                throws IOException {
            while(values.hasNext()) {
                Text value = values.next();
                output.collect(key, value);
            }
        }   
    }

我的输入文件是：

$ hadoop fs -cat NameAddress.txt
Dravid Banglore
Sachin Mumbai
Dhoni Ranchi
Dravid Jaipur
Dhoni Chennai
Sehwag Delhi
Gambhir Delhi
Gambhir Calcutta

I was expecting
Dravid Jaipur
Dhoni Chennai
Gambhir Calcutta
Sachin Mumbai
Sehwag Delhi

I got
$ hadoop fs -cat NameAddress/part-00000
Dhoni   Ranchi
Dhoni   Chennai
Dravid  Banglore
Dravid  Jaipur
Gambhir Delhi
Gambhir Calcutta
Sachin  Mumbai
Sehwag  Delhi

我认为，由于聚合是由程序员在reducer的while循环中完成的，然后写入outputcollector。我的印象是，传递给outputcollector的reducer的键总是唯一的&因为这里如果不聚合，最后一个键的值将覆盖上一个值。显然不是这样。有没有人能给我一个输出收集器，它是如何工作的，以及如何处理所有的钥匙更好的说明。我在hadoop src代码中看到了outputcollector的许多实现。我可以写我自己的outputcollector，可以做我所期望的吗？

hadoop mapreduce reduce partitioner

来源：https://stackoverflow.com/questions/12763478/how-outputcollector-works