我在写一个习惯 key
类,无 hashCode
实施。
我经营一家 map-reduce
作业,但在作业配置期间,我设置 partitoner
类别:如
Job job = Job.getInstance(config);
job.setJarByClass(ReduceSideJoinDriver.class);
FileInputFormat.addInputPaths(job, filePaths.toString());
FileOutputFormat.setOutputPath(job, new Path(args[args.length-1]));
job.setMapperClass(JoiningMapper.class);
job.setReducerClass(JoiningReducer.class);
job.setPartitionerClass(TaggedJoiningPartitioner.class); -- Here is the partitioner set
job.setGroupingComparatorClass(TaggedJoiningGroupingComparator.class);
job.setOutputKeyClass(TaggedKey.class);
job.setOutputValueClass(Text.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
这是你的名字 partitioner
实施:
public class TaggedJoiningPartitioner extends Partitioner<TaggedKey,Text> {
@Override
public int getPartition(TaggedKey taggedKey, Text text, int numPartitions) {
return Math.abs(taggedKey.getJoinKey().hashCode()) % numPartitions;
}
}
我负责 map-reduce
作业并保存输出。
现在我评论一下 job.setPartitionerClass(TaggedJoiningPartitioner.class);
在上面的工作设置中。
我实现了 hashCode()
在我的自定义类中,如下所示:
public class TaggedKey implements Writable, WritableComparable<TaggedKey> {
private Text joinKey = new Text();
private IntWritable tag = new IntWritable();
@Override
public int compareTo(TaggedKey taggedKey) {
int compareValue = this.joinKey.compareTo(taggedKey.getJoinKey());
if(compareValue == 0 ){
compareValue = this.tag.compareTo(taggedKey.getTag());
}
return compareValue;
}
@Override
public void write(DataOutput out) throws IOException {
joinKey.write(out);
tag.write(out);
}
@Override
public void readFields(DataInput in) throws IOException {
joinKey.readFields(in);
tag.readFields(in);
}
@Override
public int hashCode(){
return joinKey.hashCode();
}
@Override
public boolean equals(Object o){
if (this==o)
return true;
if (!(o instanceof TaggedKey)){
return false;
}
TaggedKey that=(TaggedKey)o;
return this.joinKey.equals(that.joinKey);
}
}
现在我再次运行作业(注意:我没有任何 partitoner
设置)。在map reduce作业之后,我比较上一个作业的输出。它们都完全一样。
所以我的问题是:
1) Is this behavior universal, that is always reproducible in any
custom implementations?
2) Does implementing hashcode on my key class is same as doing a
job.setPartitionerClass.
3) If they both serve same purpose, what is the need for
setPartitonerClass?
4) if both hashcode() implementation and Partitonerclass
implementation are conflicting, which one will take precedence?
1条答案
按热度按时间hujrc8aj1#
您将得到相同的结果,因为您的自定义分区器所做的正是默认分区器所做的。您只需要将代码移动到另一个类并在那里执行它。放入不同的逻辑,如key().tostring().length()%numpartitions或其他一些,而不是获取hashcode()%numpartitions,您将看到不同的键到reducer的分布。
例如,您不能仅通过编辑hashcode()来获取此消息
公共静态类mypartitioner扩展了partitioner{