mapreduce过滤以获取不在订单列表中的客户?

bnlyeluc  于 2021-07-13  发布在  Hadoop
关注(0)|答案(1)|浏览(378)

目前正在学习mapreduce并试图找出如何将其编码为java。
两个输入文件,名为customers.txt和car\u orders.txt:

customers.txt
===================
12345 Peter
12346 Johnson
12347 Emily
12348 Brad

[custNum, custName]

car_orders.txt
===================
00034 12345 23413
00035 12345 94832
00036 12346 8532
00037 12348 9483

[orderNo, custNum, carValue]

我们的想法是应用mapreduce并输出没有订购汽车的客户—在上面的场景中是emily。

Output:
===================
12347 Emily

这就是我的想法:

Map phase:
1. Read the data inside customers.txt, get key-value pair, (custNum, custName)
2. Read the data inside car_orders.txt, get key-value pair, (custNum, [orderNo, carValue])
3. Partition into groups based on the key

Reduce phase:
1. Compare key-value A and key-value B, if key-value B is NULL
2. Output key-value A

在此应用程序中,任何以伪代码形式提供的帮助都将不胜感激。

uwopmtnx

uwopmtnx1#

它基本上是一个reduce-side连接,在这里您丢弃两边都被填充的输出-就像您将它放在伪代码中一样。
hadoop mapreduce中的代码如下所示:

class TextMap extends Mapper<LongWritable, Text, Text, Text> {

   @Override
   public void map(LongWritable key, Text value, Context context) {
       String[] a = value.toString().split(" "); // assuming space separation
       if (a.length == 2) {
          context.write(new Text(a[0]), new Text(a[1]));
       } else if (a.length == 3) {
          context.write(new Text(a[1]), new Text(a[2]));
       }
   }
}

会发出:

12345 Peter
12346 Johnson
12347 Emily
12348 Brad
12345 23413
12345 94832
12346 8532
12348 9483

所以减速机看起来很简单:

class TextReduce extends Reducer<Text, Text, Text, Text> {

   @Override
   public void reduce(Text key, Iterable<Text> values, Context context) {
      List<String> vals = new ArrayList<>();
      for(Text t : values) {
         vals.add(t.toString());
      }

      if(vals.size() == 1) {
         context.write(new Text(vals.get(0)), new Text(""));
      }
   }
}

那应该只是发出 Emily .

相关问题