如果我已经为map reduce作业中的键实现了hashcode,那么custompartitioner有帮助吗?

vecaoik1  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(324)

我在写一个习惯 key 类,无 hashCode 实施。
我经营一家 map-reduce 作业,但在作业配置期间,我设置 partitoner 类别:如

Job job = Job.getInstance(config);
        job.setJarByClass(ReduceSideJoinDriver.class);

        FileInputFormat.addInputPaths(job, filePaths.toString());
        FileOutputFormat.setOutputPath(job, new Path(args[args.length-1]));

        job.setMapperClass(JoiningMapper.class);
        job.setReducerClass(JoiningReducer.class);
        job.setPartitionerClass(TaggedJoiningPartitioner.class); -- Here is the partitioner set
        job.setGroupingComparatorClass(TaggedJoiningGroupingComparator.class);
        job.setOutputKeyClass(TaggedKey.class);
        job.setOutputValueClass(Text.class);
        System.exit(job.waitForCompletion(true) ? 0 : 1);

这是你的名字 partitioner 实施:

public class TaggedJoiningPartitioner extends Partitioner<TaggedKey,Text> {

    @Override
    public int getPartition(TaggedKey taggedKey, Text text, int numPartitions) {
        return Math.abs(taggedKey.getJoinKey().hashCode()) % numPartitions;
    }
}

我负责 map-reduce 作业并保存输出。
现在我评论一下 job.setPartitionerClass(TaggedJoiningPartitioner.class); 在上面的工作设置中。
我实现了 hashCode() 在我的自定义类中,如下所示:

public class TaggedKey implements Writable, WritableComparable<TaggedKey> {

    private Text joinKey = new Text();
    private IntWritable tag = new IntWritable();

    @Override
    public int compareTo(TaggedKey taggedKey) {
        int compareValue = this.joinKey.compareTo(taggedKey.getJoinKey());
        if(compareValue == 0 ){
            compareValue = this.tag.compareTo(taggedKey.getTag());
        }
       return compareValue;
    }

    @Override
    public void write(DataOutput out) throws IOException {
        joinKey.write(out);
        tag.write(out);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        joinKey.readFields(in);
        tag.readFields(in);
    }

    @Override
    public int hashCode(){
        return joinKey.hashCode();
    }

    @Override
    public boolean equals(Object o){
        if (this==o)
            return true;
        if (!(o instanceof TaggedKey)){
            return false;
        }
        TaggedKey that=(TaggedKey)o;
        return this.joinKey.equals(that.joinKey);
    }
}

现在我再次运行作业(注意:我没有任何 partitoner 设置)。在map reduce作业之后,我比较上一个作业的输出。它们都完全一样。
所以我的问题是:

1)  Is this behavior universal, that is always reproducible in any
        custom implementations? 

    2) Does implementing hashcode on my key class is same as doing a
    job.setPartitionerClass.

    3) If they both serve same purpose, what is the need for
    setPartitonerClass?

    4) if both hashcode() implementation and Partitonerclass
    implementation are conflicting, which one will take precedence?
hujrc8aj

hujrc8aj1#

您将得到相同的结果,因为您的自定义分区器所做的正是默认分区器所做的。您只需要将代码移动到另一个类并在那里执行它。放入不同的逻辑,如key().tostring().length()%numpartitions或其他一些,而不是获取hashcode()%numpartitions,您将看到不同的键到reducer的分布。
例如,您不能仅通过编辑hashcode()来获取此消息
公共静态类mypartitioner扩展了partitioner{

@Override
    public int getPartition(Text key, Text value, int numReduceTasks) {

        int len = key.value().length;

        if(numReduceTasks == 0)
            return 0;

        if(len <=numReduceTasks/3){               
            return 0;
        }
        if(len >numReduceTasks/3 && len <=numReduceTasks/2){

            return 1 % numReduceTasks;
        }
        else
            return len % numReduceTasks;
    }
}

相关问题