如果我已经为map reduce作业中的键实现了hashcode,那么custompartitioner有帮助吗?

vecaoik1  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(331)

我在写一个习惯 key 类,无 hashCode 实施。
我经营一家 map-reduce 作业,但在作业配置期间,我设置 partitoner 类别:如

  1. Job job = Job.getInstance(config);
  2. job.setJarByClass(ReduceSideJoinDriver.class);
  3. FileInputFormat.addInputPaths(job, filePaths.toString());
  4. FileOutputFormat.setOutputPath(job, new Path(args[args.length-1]));
  5. job.setMapperClass(JoiningMapper.class);
  6. job.setReducerClass(JoiningReducer.class);
  7. job.setPartitionerClass(TaggedJoiningPartitioner.class); -- Here is the partitioner set
  8. job.setGroupingComparatorClass(TaggedJoiningGroupingComparator.class);
  9. job.setOutputKeyClass(TaggedKey.class);
  10. job.setOutputValueClass(Text.class);
  11. System.exit(job.waitForCompletion(true) ? 0 : 1);

这是你的名字 partitioner 实施:

  1. public class TaggedJoiningPartitioner extends Partitioner<TaggedKey,Text> {
  2. @Override
  3. public int getPartition(TaggedKey taggedKey, Text text, int numPartitions) {
  4. return Math.abs(taggedKey.getJoinKey().hashCode()) % numPartitions;
  5. }
  6. }

我负责 map-reduce 作业并保存输出。
现在我评论一下 job.setPartitionerClass(TaggedJoiningPartitioner.class); 在上面的工作设置中。
我实现了 hashCode() 在我的自定义类中,如下所示:

  1. public class TaggedKey implements Writable, WritableComparable<TaggedKey> {
  2. private Text joinKey = new Text();
  3. private IntWritable tag = new IntWritable();
  4. @Override
  5. public int compareTo(TaggedKey taggedKey) {
  6. int compareValue = this.joinKey.compareTo(taggedKey.getJoinKey());
  7. if(compareValue == 0 ){
  8. compareValue = this.tag.compareTo(taggedKey.getTag());
  9. }
  10. return compareValue;
  11. }
  12. @Override
  13. public void write(DataOutput out) throws IOException {
  14. joinKey.write(out);
  15. tag.write(out);
  16. }
  17. @Override
  18. public void readFields(DataInput in) throws IOException {
  19. joinKey.readFields(in);
  20. tag.readFields(in);
  21. }
  22. @Override
  23. public int hashCode(){
  24. return joinKey.hashCode();
  25. }
  26. @Override
  27. public boolean equals(Object o){
  28. if (this==o)
  29. return true;
  30. if (!(o instanceof TaggedKey)){
  31. return false;
  32. }
  33. TaggedKey that=(TaggedKey)o;
  34. return this.joinKey.equals(that.joinKey);
  35. }
  36. }

现在我再次运行作业(注意:我没有任何 partitoner 设置)。在map reduce作业之后,我比较上一个作业的输出。它们都完全一样。
所以我的问题是:

  1. 1) Is this behavior universal, that is always reproducible in any
  2. custom implementations?
  3. 2) Does implementing hashcode on my key class is same as doing a
  4. job.setPartitionerClass.
  5. 3) If they both serve same purpose, what is the need for
  6. setPartitonerClass?
  7. 4) if both hashcode() implementation and Partitonerclass
  8. implementation are conflicting, which one will take precedence?
hujrc8aj

hujrc8aj1#

您将得到相同的结果,因为您的自定义分区器所做的正是默认分区器所做的。您只需要将代码移动到另一个类并在那里执行它。放入不同的逻辑,如key().tostring().length()%numpartitions或其他一些,而不是获取hashcode()%numpartitions,您将看到不同的键到reducer的分布。
例如,您不能仅通过编辑hashcode()来获取此消息
公共静态类mypartitioner扩展了partitioner{

  1. @Override
  2. public int getPartition(Text key, Text value, int numReduceTasks) {
  3. int len = key.value().length;
  4. if(numReduceTasks == 0)
  5. return 0;
  6. if(len <=numReduceTasks/3){
  7. return 0;
  8. }
  9. if(len >numReduceTasks/3 && len <=numReduceTasks/2){
  10. return 1 % numReduceTasks;
  11. }
  12. else
  13. return len % numReduceTasks;
  14. }
  15. }
展开查看全部

相关问题