在使用hadoop时，如何定制fileinputformat来读取java中的多行文件？

vsnjm48y 于 2021-06-03 发布在 Hadoop

关注(0)|答案(3)|浏览(331)

我在java中使用mapreduce框架，我想创建一个定制的文件输入格式。
假设我的文件格式如下：

来源：https://stackoverflow.com/questions/21866728/how-to-customize-fileinputformat-to-read-multiple-lines-of-a-file-in-java-when-u

3条答案

按热度按时间

7cjasjjr1#

我已经做过类似的事情了，这里我用“$$$”作为分隔符（我把它作为配置参数传递给job）。您可以在这里检查代码，并在同一个项目上检查代码的具体实现。我定制了读写器和输入格式。

赞(0）回复(0）举报 2021-06-03

vs91vp4v2#

在本例中，每个记录都被视为多行的数组。
按照教程，我写了以下内容：

public class CustomInputFormat extends FileInputFormat<Text, IdxValues> {
    public RecordReader<Text, IdxValues> getRecordReader(
            InputSplit input, JobConf job, Reporter report) throws IOException {
        report.setStatus(input.toString());
        return new CustomReader(job, (FileSplit)input);
    } 
}