java—如何使hadoop reducer为一个键输出多个值

t98cgbkg 于 2021-06-02 发布在 Hadoop

关注(0)|答案(1)|浏览(343)

我有一些数据集，我想计算每个记录的最小值、最大值和平均值（例如：userid\u1--minimum\u1--maximum\u1--avg）。
这是我的代码，我需要知道怎么做才能让我为单个键编写这些值：

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterable<IntWritable> values, Context context)
            throws IOException, InterruptedException {
        int sum = 0;
        int visitsCounter = 0;
        int min = Integer.MAX_VALUE;
        int max = Integer.MIN_VALUE;
        float avg;
        for (IntWritable val : values) {
            int currentValue = val.get();
            sum += currentValue;
            visitsCounter++;
            min = Math.min(min, currentValue);
            max = Math.max(max, currentValue);
        }
        avg = sum / visitsCounter;

        //here can be the supposed edit to let me output (user - min - max - avg )
        context.write(key, new IntWritable(sum));
    }
}

Java hadoop mapreduce

来源：https://stackoverflow.com/questions/38120952/how-to-make-hadoop-reducer-output-multiple-values-for-a-single-key

1条答案

按热度按时间

e3bfsja21#

在mapreduce中，数据在两个阶段（即map阶段和reduce阶段）按键值对流动。
所以我们需要在map级和reduce级设计键值对。
这里的键和值数据类型是可写的。
键可以由多个值组成，值可以由多个值组成。
对于原子值的情况，我们使用intwritable、doublewritable、longwritable、floatwritable等。。。
对于复杂的键和值数据情况，我们使用文本数据类型或用户定义的数据类型。
处理这种情况的简单解决方案是使用文本数据类型，即将所有这些列串联成一个字符串对象，并将这个字符串对象序列化成文本对象。但由于在大型数据集上有大量的字符串串联，这是效率低下的。
使用自定义/用户定义的数据类型来处理这种情况。使用HadoopAPI中的可写或可写可比较接口编写自定义数据类型。

public static class Reduce extends Reducer<Text, IntWritable, Text, Text> {
    Text emitValue = new Text()
    public void reduce(Text key, Iterable<IntWritable> values, Context context)
            throws IOException, InterruptedException {
        int sum = 0;
        int visitsCounter = 0;
        int min = Integer.MAX_VALUE;
        int max = Integer.MIN_VALUE;
        float avg;
        for (IntWritable val : values) {
            int currentValue = val.get();
            sum += currentValue;
            visitsCounter++;
            min = Math.min(min, currentValue);
            max = Math.max(max, currentValue);
        }
        avg = sum / visitsCounter;
        String myValue = min + "\t" + max + "\t" + avg;
        emitValue.set(myValue);
        //here can be the supposed edit to let me output (user - min - max - avg )
        context.write(key, emitValue);
    }
}

赞(0）回复(0）举报 2021-06-02

我来回答

java—如何使hadoop reducer为一个键输出多个值

1条答案

相关问题

热门标签

最新问答