hadoop:如何将reducer输出合并到单个文件？

5kgi1eie 于 2021-06-03 发布在 Hadoop

关注(0)|答案(2)|浏览(557)

这个问题在这里已经有答案了：

减少阶段后合并输出文件（10个答案）
7年前关门了。
我知道shell中的“getmerge”命令可以完成这项工作。
但是如果我想在通过hdfsapi for java完成作业之后合并这些输出，我应该怎么做呢
我真正想要的是hdfs上的一个合并文件。
我唯一能想到的就是在那之后再开始一份工作。
谢谢！

Java hadoop hdfs mapreduce merge

来源：https://stackoverflow.com/questions/12911798/hadoop-how-can-i-merge-reducer-outputs-to-a-single-file

2条答案

按热度按时间

6qfn3psc1#

但是如果我想在通过hdfsapi for java完成作业之后合并这些输出，我应该怎么做呢？
我猜，因为我自己还没有尝试过这个方法，但是我认为您正在寻找的方法是fileutil.copymerge，它是fsshell在运行 -getmerge 命令。 FileUtil.copyMerge 将两个文件系统对象作为参数-fsshell使用filesystem.getlocal检索目标文件系统，但我看不出有任何理由不能在目标上使用path.getfilesystem来获取输出流
也就是说，我不认为它能给您带来什么好处——合并仍然在本地jvm中进行；所以你没有存太多的钱 -getmerge 然后 -put .

赞(0）回复(0）举报 2021-06-03

3npbholx2#

通过在代码中设置一个reducer，可以得到一个输出文件。

Job.setNumberOfReducer(1);

可以满足您的要求，但成本高昂
或

Static method to execute a shell command. 
Covers most of the simple cases without requiring the user to implement the Shell interface.

Parameters:
env the map of environment key=value
cmd shell command to execute.
Returns:
the output of the executed command.

org.apache.hadoop.util.Shell.execCommand(String[])

赞(0）回复(0）举报 2021-06-03

我来回答

hadoop:如何将reducer输出合并到单个文件？

2条答案

相关问题

热门标签

最新问答