mapreduce代码的优化(reduce-side-join)

xeufq47z 于 2021-06-02 发布在 Hadoop

关注(0)|答案(1)|浏览(369)

我需要你的帮助来优化我的Map代码。我在mapreduce设计模式一书中使用了reduce side join的设计模式。所有的工作，但我试图改善代码，不重复键连接在连接过程中。
实际上键联接在第二个表的值中，所以我想删除它。这就是为什么，我分割我的价值，并试图删除第一个元素。但我认为这种方法不是最好的，而且成本很高。
这是我的mapper类：

public class MapTable2 extends Mapper<Object, Text, Text, Text> {

private Text outKey = new Text();
private Text outValue = new Text();
private String tab[];
private List<String> list;
private String tmp ="";

public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

    tab = value.toString().split(";");
    list = Arrays.asList(tab);
    outKey.set(list.get(0).trim());
    list.remove(0);
    for (String val : list) {
        tmp = tmp+val;
    }
    outValue.set("B" + tmp);
    context.write(outKey, outValue);
}

}
原始代码是：

public class MapTable2 extends Mapper<Object, Text, Text, Text>{

private Text outKey = new Text();
private Text outValue = new Text();
private String tab[] ;

public void map(Object key, Text value, Context context) throws IOException, InterruptedException{

    tab = value.toString().split(";");
    outKey.set(tab[0].trim());
    outValue.set("B" + value.toString()); // outValue = outKey + value
    context.write(outKey, outValue);
}

}
你有什么建议可以改进我的代码吗？
提前谢谢。安吉利克

Java hadoop mapreduce optimization

来源：https://stackoverflow.com/questions/23998414/optimization-of-a-mapreduce-code-reduce-side-join

1条答案

按热度按时间

gfttwv5a1#

可以使用此方法将字符串拆分为两部分：

String[] parts = value.toString().split(";", 2);
outKey.set(parts[0].trim());
outValue.set("B" + parts[1]);

赞(0）回复(0）举报 2021-06-03

我来回答

mapreduce代码的优化(reduce-side-join)

1条答案

相关问题

热门标签

最新问答