如何在新的hadoopapi中递归地使用目录结构？

g0czyy6m 于 2021-06-02 发布在 Hadoop

关注(0)|答案(2)|浏览(459)

我的文件结构如下：

/indir/somedir1/somefile
/indir/somedir1/someotherfile...
/indir/somedir2/somefile
/indir/somedir2/someotherfile...

现在，我想将所有内容递归地传递到mr作业中，并且我正在使用新的api。所以我做了：

FileInputFormat.setInputPaths(job, new Path("/indir"));

但是工作失败了：

Error: java.io.FileNotFoundException: Path is not a file: /indir/somedir1

我使用的是hadoop2.4，在本文中，hadoop2的新api不支持递归文件。但我想知道这是怎么回事，因为我认为在hadoop作业中抛出一个大型嵌套目录结构是世界上最普通的事情。。。
那么，这是有意的，还是一个bug？在这两方面，除了使用旧的api，还有其他解决方法吗？

hadoop hdfs recursion

来源：https://stackoverflow.com/questions/26647946/how-recursively-use-a-directory-structure-in-the-new-hadoop-api

2条答案

按热度按时间

4ngedf3f1#

另一种配置方法是通过 FileInputFormat 班级。

FileInputFormat.setInputDirRecursive(job, true);

赞(0）回复(0）举报 2021-06-03

hfsqlsce2#

我自己找到了答案。在上述论坛帖子中链接的jira中，有两条关于如何做对的评论：
套 mapreduce.input.fileinputformat.input.dir.recursive 至 true （注解状态） mapred.input.dir.recursive 但这是不赞成的）
使用 FileInputFormat.addInputPath 指定输入目录
有了这些变化，它就工作了。

赞(0）回复(0）举报 2021-06-03

我来回答

如何在新的hadoopapi中递归地使用目录结构？

2条答案

相关问题

热门标签

最新问答