hadoop-如何使用和减少多个输入？

piok6c0g 于 2021-06-03 发布在 Hadoop

关注(0)|答案(2)|浏览(506)

Mapper/Reducer 1 --> (key,value)
                      /   |   \
                     /    |    \
     Mapper/Reducer 2     |    Mapper/Reducer 4
     -> (oKey,oValue)     |    -> (xKey, xValue)
                          |
                          |
                    Mapper/Reducer 3
                    -> (aKey, aValue)

我有一个日志文件，我把它和mr1聚合在一起。mapper2、mapper3、mapper4将mr1的输出作为它们的输入。工作是有链子的。
mr1输出：

User     {infos of user:[{data here},{more data},{etc}]}
..

mr2输出：

timestamp       idCount
..

mr3输出：

timestamp        loginCount
..

mr4输出：

timestamp        someCount
..

我想合并mr2-4的输出：最终输出->

timestamp     idCount     loginCount   someCount
..
..
..

有没有办法不用Pig或Hive？我在用java。

Java hadoop mapreduce

来源：https://stackoverflow.com/questions/15947983/hadoop-how-to-use-and-reduce-multiple-inputs

2条答案

按热度按时间

nhaq1z211#

您可以通过多个输入来实现这一点，请参见这里的示例

赞(0）回复(0）举报 2021-06-03

6jjcrrmo2#

据我所知，在reducer类中不能有输出数组。我想解决你的问题有以下几点：
mr1的输出键是 {a,b,c} 价值是成对的 {timestamp,idCount} 或者 {timestamp, loginCount} 或者 {timestamp, someCount} 根据钥匙。你要把mr2-4合并起来。
所以过程是这样的：

MR1 <inputKey,inputValue,outputKey,outPutValue> where outputKey is 
                                       "a" for outValue`{timestamp,idCount}
                                       "b" for outValue`{timestamp, loginCount} 
                                       "c" for outValue`{timestamp, someCount} 

MR2-4<inputKey,inputValue,outputKey,outPutValue> if inputkey is "a" do MR2
                                                 if inputkey is "b" do MR3
                                                 if inputkey is "c" do MR4

还有一些方法称为 Partitioner and GroupComperator 其中可以使用{key/value}和mapper/reducer key+some_part_of_value 作为关键。

赞(0）回复(0）举报 2021-06-03

我来回答

hadoop-如何使用和减少多个输入？

2条答案

相关问题

热门标签

最新问答