java—如何为hadoop实现组比较器？

gev0vcfq 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(590)

给定一个名为keylabeldistance的类，我在hadoop中传递它作为键和值，我想对它执行二次排序，即我首先要根据键的增加值排序，然后按距离的递减顺序排序。
为了做到这一点，我需要编写自己的groupingcomparator。我的问题是，由于setgroupingcomparator（）方法仅将扩展rawcomarator的类作为参数，因此如何在grouping comparator中以字节为单位执行此比较？我需要显式序列化和反序列化对象吗？还有，让keylabeldistance类实现如下的writeableComparable是否会使对sortcomparator的需求变得多余？
我从这个答案中得到了sortcomparator和groupcomparator的用法：hadoop中sort comparator和groupcomparator之间有什么区别？
以下是keylabeldistance的实现：

public class KeyLabelDistance implements WritableComparable<KeyLabelDistance>
    {
        private int key;
        private int label;
        private double distance;
        KeyLabelDistance()
        {
            key = 0;
            label = 0;
            distance = 0;
        }
        KeyLabelDistance(int key, int label, double distance)
        {
            this.key = key;
            this.label = label;
            this.distance = distance;
        }
        public int getKey() {
            return key;
        }
        public void setKey(int key) {
            this.key = key;
        }
        public int getLabel() {
            return label;
        }
        public void setLabel(int label) {
            this.label = label;
        }
        public double getDistance() {
            return distance;
        }
        public void setDistance(double distance) {
            this.distance = distance;
        }

        public int compareTo(KeyLabelDistance lhs, KeyLabelDistance rhs)
        {
            if(lhs == rhs)
                return 0;
            else
            {
                if(lhs.getKey() < rhs.getKey())
                    return -1;
                else if(lhs.getKey() > rhs.getKey())
                    return 1;
                else
                {
                    //If the keys are equal, look at the distances -> since more is the "distance" more is the "similarity", the comparison is counterintuitive
                    if(lhs.getDistance() < rhs.getDistance() )
                        return 1;
                    else if(lhs.getDistance() > rhs.getDistance())
                        return -1;
                    else return 0;
                }
            }
        }
    }

组比较器的代码如下：

public class KeyLabelDistanceGroupingComparator extends WritableComparator{
    public int compare (KeyLabelDistance lhs, KeyLabelDistance rhs)
    {
        if(lhs == rhs)
            return 0;
        else
        {
            if(lhs.getKey() < rhs.getKey())
                return -1;
            else if(lhs.getKey() > rhs.getKey())
                return 1;
            return 0;
        }
    }
}

如有任何帮助，我们将不胜感激。

Java hadoop bigdata

来源：https://stackoverflow.com/questions/22803542/how-to-implement-a-group-comparator-for-hadoop