hadoop mapreduce，如何减少自定义对象？

disho6za 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(457)

我是hadoop的新手，我正在尝试使用reducer类。所以，基本上我在网上找到了一个教程，他们的reduce类是这样的，

public class mapReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
    IntWritable total = new IntWritable();
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values,
            Reducer<Text, InWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException{
        for (IntWritable value: values){
             total += value.get();
        }
        context.write(key, count);
    }
}

所以我想把总数改成 myCustomObj . 参考上面的例子，

//..
myCustomObj total = new myCustomObj();
@Override
protected void reduce(Text key, Iterable<myCustomObj> values,
        Reducer<Text, InWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException{
    for (myCustomObj value: values){
         total.add(value);
    }
    context.write(key, total.getPrimaryAttribute());
}

目标：我想要的是 key -> total hadoop之后的对象已经完成了还原。我想上面的代码只会输出 key -> primaryAttribute .
建议：如果这太乏味，我有一个想法，以xml格式将我需要的细节存储在磁盘上。但是，我不确定map reducer背后的原理，reducer是在服务器上执行还是在客户机上执行（Map发生的地方）？如果它发生在客户机上，那么我会在所有客户机上都有一点点的xml文件。我只想把所有的信息集中到一台服务器上。
我希望我把问题说清楚。谢谢您
编辑：我试着在网上寻找资料来源。但是hadoop有很多定制。我不知道该看什么。

Java hadoop

来源：https://stackoverflow.com/questions/43161354/hadoop-mapreduce-how-to-reduce-a-custom-object

1条答案

按热度按时间

xmakbtuz1#

为了能够减少一个自定义对象，首先，Map程序应该将这个对象作为一个值返回。假设对象的名称为： CustomObject Map器定义应如下所示：

public class MyMapper extends Mapper<LongWritable, Text, Text, CustomObject> {
    @Override
    protected void map(LongWritable key, Text value,
            Mapper<LongWritable, Text, Text, CustomObject>.Context context) throws IOException, InterruptedException {
        // do you stuff here
    }
}

现在customobject本身应该实现 WritableComparable 与所有三种所需方法的接口（主要针对洗牌阶段的要求）： write -定义对象存储到磁盘的方式 readFields -如何从磁盘读取存储的对象 compareTo -定义对象的排序方式（您可以将此项留空，因为在无序播放阶段只有键用于排序）
减速机签名应如下所示：

public class MyReducer extends Reducer<Text, CustomObject, Text, IntWritable>{
    @Override
    protected void reduce(Text key, Iterable<CustomObject> values,
            Reducer<Text, CustomObject, Text, IntWritable>.Context context) throws IOException, InterruptedException{
        // reducer code
    }
}

最后，在配置作业时，应该指定适当的输入/输出组合。

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(CustomObject.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);

这应该能奏效。

赞(0）回复(0）举报 2021-05-29

我来回答

hadoop mapreduce，如何减少自定义对象？

1条答案

相关问题

热门标签

最新问答