hadoop—将所有文件内容传递给map reduce中的map函数，并将其附加到序列文件中

sauutmhj 于 2021-05-29 发布在 Hadoop

关注(0)|答案(0)|浏览(233)

我必须读取文件a的所有内容并将其传递给map函数。在map函数中，key是fileb，value是filea的内容。在outputformat recordreader中，我使用sequence file writer append方法将所有值（filea的所有内容）附加到fileb。问题是

1. I am loading all file contents in inputFormat recordReader and passing it to single map function.
 2. Appending all contents in sequence file.
PseudoCode:
InputFormat RecordReader:
@Override
  public boolean nextKeyValue() throws IOException, InterruptedException {
    if(flag>0)
      return false;
      flag++;
      String re=read all contents of file
      String key= k1;
      allRecords = new TextArrayWritable(Text.class, new Text[] {new Text(key),
                      new Text(re)});
      return true;
  }
@Override
  public TextArrayWritable getCurrentValue() throws IOException, InterruptedException {
    return allRecords;
  }
Map Function:
protected void map(Text key, TextArrayWritable value,
      Context context) throws IOException,
      InterruptedException {
    context.write(new Text(fileA path),value);
  }
OutputFormat RecordWriter:
@Override
    public void write(Text fileDir, TextArrayWritable contents) throws IOException,
        InterruptedException {
      SequenceFileWriter.append(contents.get()[0], contents.get()[1]);
}

这两个操作都是内存中的操作，如果文件太大，可能会抛出内存不足错误。有没有办法避免将整个内容加载到内存中，并将其附加到序列文件中？

hadoop mapreduce sequencefile hadoop2 hadoop-partitioning

来源：https://stackoverflow.com/questions/43100048/passing-all-file-contents-to-map-function-in-map-reduce-and-appending-it-to-sequ

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

hadoop—将所有文件内容传递给map reduce中的map函数，并将其附加到序列文件中

暂无答案！

相关问题

热门标签

最新问答