在mapreduce中写入多个o/p文件时出现问题

zour9fqk 于 2021-05-30 发布在 Hadoop

关注(0)|答案(2)|浏览(322)

我有一个要求，分裂成2个输出文件的基础上过滤条件我的输入文件。我的输出目录如下所示：

/hdfs/base/dir/matched/YYYY/MM/DD
/hdfs/base/dir/notmatched/YYYY/MM/DD

我正在使用 MultipleOutputs 类来分割Map函数中的数据。在我的驱动程序类中，我使用如下：

FileOutputFormat.setOutputPath(job, new Path("/hdfs/base/dir"));

在mapper中，我使用的是：

mos.write(key, value, fileName); // File Name is generating based on filter criteria

这个程序一天内运行良好。但在第二天，我的程序失败了：

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://nameservice1/hdfs/base/dir already exists

我不能在第二天使用不同的基目录。
我如何处理这种情况？
注意：我不想通过读取输入来创建两个单独的文件。

hadoop mapreduce MultipleOutputs

来源：https://stackoverflow.com/questions/29506388/issue-while-writing-multiple-o-p-files-in-mapreduce

2条答案

按热度按时间

hts6caw31#

输出值中可以有一个标志列。稍后您可以处理输出并按标志列将其拆分。

赞(0）回复(0）举报 2021-05-30

rqdpfwrv2#

创建自定义o/p格式类，如下所示

package com.visa.util;

import java.io.IOException;

import org.apache.hadoop.mapreduce.JobContext;
import org.apache.hadoop.mapreduce.OutputCommitter;
import org.apache.hadoop.mapreduce.RecordWriter;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;

public class CostomOutputFormat<K, V> extends SequenceFileOutputFormat<K, V>{

    @Override
    public void checkOutputSpecs(JobContext arg0) throws IOException {
    }

    @Override
    public OutputCommitter getOutputCommitter(TaskAttemptContext arg0) throws IOException {
        return super.getOutputCommitter(arg0);
    }

    @Override
    public RecordWriter<K, V> getRecordWriter(TaskAttemptContext arg0) throws IOException, InterruptedException {
        return super.getRecordWriter(arg0);
    }

}

并在驱动程序类中使用：

job.setOutputFormatClass(CostomOutputFormat.class);

它将跳过是否存在o/p目录的检查。

赞(0）回复(0）举报 2021-05-30

我来回答

在mapreduce中写入多个o/p文件时出现问题

2条答案

相关问题

热门标签

最新问答