在hadoop中写入多个文件夹？

twh00eeo 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(300)

我正在尝试将我的输出从reducer分离到不同的文件夹。。

My dirver has the following code:
 FileOutputFormat.setOutputPath(job, new Path(output));
            //MultipleOutputs.addNamedOutput(job, namedOutput, outputFormatClass, keyClass, valueClass)
            //MultipleOutputs.addNamedOutput(job, namedOutput, outputFormatClass, keyClass, valueClass)
            MultipleOutputs.addNamedOutput(job, "foo", TextOutputFormat.class, NullWritable.class, Text.class);
            MultipleOutputs.addNamedOutput(job, "bar", TextOutputFormat.class, Text.class,NullWritable.class);
            MultipleOutputs.addNamedOutput(job, "foobar", TextOutputFormat.class, Text.class, NullWritable.class);

And then my reducer has the following code:
mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
mos.write("bar", key,NullWritable.get());
mos.write("foobar", key,NullWritable.get());

But in the output, I see:

output/foo-r-0001
output/foo-r-0002
output/foobar-r-0001
output/bar-r-0001

But what I am trying is :

output/foo/part-r-0001
output/foo/part-r-0002
output/bar/part-r-0001

输出/foobar/part-r-0001
我该怎么做？谢谢

hadoop

来源：https://stackoverflow.com/questions/19328136/writing-to-multiple-folders-in-hadoop

1条答案

按热度按时间

2uluyalo1#

如果您的意思是这个倍数输出，最简单的方法是从您的减速机中执行以下操作之一--
将命名输出与基本输出路径一起使用。请参见此函数。
不使用命名输出，只使用基本输出路径，请参见此函数
在你的情况下，这是第1点，所以，请更改以下内容--

mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
mos.write("bar", key,NullWritable.get());
mos.write("foobar", key,NullWritable.get());

到，

mos.write("foo",NullWritable.get(),new Text(jsn.toString()), "foo/part");
mos.write("bar", key,NullWritable.get(), "bar/part");
mos.write("foobar", key,NullWritable.get(), "foobar/part");

其中，“foo/part”、“bar/part”和“foobar/part”对应于baseoutputpath。因此，目录foo，bar和foobar将被创建并在part-r-xx文件中。
您也可以尝试上面的第2点，它实际上不需要任何命名的输出。
如有需要，请回复我进一步澄清。

赞(0）回复(0）举报 2021-06-03

我来回答

在hadoop中写入多个文件夹？

1条答案

相关问题

热门标签

最新问答