给定一个名为keylabeldistance的类,我在hadoop中传递它作为键和值,我想对它执行二次排序,即我首先要根据键的增加值排序,然后按距离的递减顺序排序。
为了做到这一点,我需要编写自己的groupingcomparator。我的问题是,由于setgroupingcomparator()方法仅将扩展rawcomarator的类作为参数,因此如何在grouping comparator中以字节为单位执行此比较?我需要显式序列化和反序列化对象吗?还有,让keylabeldistance类实现如下的writeableComparable是否会使对sortcomparator的需求变得多余?
我从这个答案中得到了sortcomparator和groupcomparator的用法:hadoop中sort comparator和groupcomparator之间有什么区别?
以下是keylabeldistance的实现:
public class KeyLabelDistance implements WritableComparable<KeyLabelDistance>
{
private int key;
private int label;
private double distance;
KeyLabelDistance()
{
key = 0;
label = 0;
distance = 0;
}
KeyLabelDistance(int key, int label, double distance)
{
this.key = key;
this.label = label;
this.distance = distance;
}
public int getKey() {
return key;
}
public void setKey(int key) {
this.key = key;
}
public int getLabel() {
return label;
}
public void setLabel(int label) {
this.label = label;
}
public double getDistance() {
return distance;
}
public void setDistance(double distance) {
this.distance = distance;
}
public int compareTo(KeyLabelDistance lhs, KeyLabelDistance rhs)
{
if(lhs == rhs)
return 0;
else
{
if(lhs.getKey() < rhs.getKey())
return -1;
else if(lhs.getKey() > rhs.getKey())
return 1;
else
{
//If the keys are equal, look at the distances -> since more is the "distance" more is the "similarity", the comparison is counterintuitive
if(lhs.getDistance() < rhs.getDistance() )
return 1;
else if(lhs.getDistance() > rhs.getDistance())
return -1;
else return 0;
}
}
}
}
组比较器的代码如下:
public class KeyLabelDistanceGroupingComparator extends WritableComparator{
public int compare (KeyLabelDistance lhs, KeyLabelDistance rhs)
{
if(lhs == rhs)
return 0;
else
{
if(lhs.getKey() < rhs.getKey())
return -1;
else if(lhs.getKey() > rhs.getKey())
return 1;
return 0;
}
}
}
如有任何帮助,我们将不胜感激。
1条答案
按热度按时间6yt4nkrj1#
您可以扩展writeablecomparator,后者反过来实现rawcomarator。排序和分组比较器都将扩展writeablecomparator。
如果不提供这些比较器,hadoop将在内部使用可写的compareto,这是您的密钥。