hadoop中的表连接

von4xj4u 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(428)

我是hadoop新手，正在编写我的第一个程序来连接mapreduce中的以下两个表。
第一张table：

11111   John
22222   Robert  
33333   Stephan
44444   Peter
55555   Andersen

第二张table：

11111   Washington    EEE   2011
22222   Jacksonville  EIE   2010
33333   Minneapolis   ECE   2012
44444   Cheyenne      CSE   2013
55555   Detroit       IT    2014

我已经上传了上述两个文本文件到hdfs使用色调。每列之间都有一个制表位。
运行代码后，我得到一个意外的输出，如下所示：

11111   John    Washington  EEE 2011        
22222   Jacksonville    EIE 2010        Robert  
33333   Stephan Minneapolis ECE 2012        
44444   Cheyenne    CSE 2013        Peter   
55555   Andersen    Detroit     IT  2014

我不知道我的代码出了什么问题。以下是我的java代码：
驱动器类：

public class DriverClass extends Configured{
public static void main (String args[]) throws IOException, ClassNotFoundException, InterruptedException{
    Job job = new Job();
    job.setJarByClass(DriverClass.class);
    MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, MapperClassOne.class);
    MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, MapperClassTwo.class);
    FileOutputFormat.setOutputPath(job, new Path(args[2]));
    job.setReducerClass(ReducerClass.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);
    System.exit(job.waitForCompletion(true)? 0 : -1);
}
}

我的第一个数据集（第一个表）的mapperclass-mapperclassone:

public class MapperClassOne extends Mapper<LongWritable, Text, Text, Text>{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{
    String[] line = value.toString().split("\t");
    context.write(new Text(line[0]), new Text(line[1]));
}
}

我的第二个数据集（第二个表）的mapperclass-MapPerClass2：

public class MapperClassTwo extends Mapper<LongWritable, Text, Text, Text>{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{
    String[] line = value.toString().split("\t");
    String temp = "";
    for(int i=1; i<line.length; i++){
        temp += line[i] + "\t";
    }
    context.write(new Text(line[0]), new Text(temp));
}
}

减速器等级：

public class ReducerClass extends Reducer<Text, Text, Text, Text>{
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException{
    Iterator<Text> iter = values.iterator();
    String temp = "";
    while(iter.hasNext()){
        temp += iter.next().toString() + "\t";
    }
    context.write(key, new Text(temp));
}
}

请帮助我，如果有更好的方法来执行表联接，也请建议我。

hadoop mapreduce Join

来源：https://stackoverflow.com/questions/31233333/table-joins-in-mapreduce-hadoop

1条答案

按热度按时间

rslzwgfq1#

在reducer中，除非实现二次排序，否则不会对键的值进行排序。在当前的实现中，键的值可能以任意顺序出现。您需要向Map器值添加标识符，以标识reducer中键的值源。
请参阅：http://kickstarthadoop.blogspot.com/2011/09/joins-with-plain-map-reduce.htmlhttp://www.lichun.cc/blog/2012/05/hadoop-genericwritable-sample-usage/

赞(0）回复(0）举报 2021-05-30

我来回答

hadoop中的表连接

1条答案

相关问题

热门标签

最新问答