hadoop—将所有文件内容传递给map reduce中的map函数,并将其附加到序列文件中

sauutmhj  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(233)

我必须读取文件a的所有内容并将其传递给map函数。在map函数中,key是fileb,value是filea的内容。在outputformat recordreader中,我使用sequence file writer append方法将所有值(filea的所有内容)附加到fileb。问题是

  1. 1. I am loading all file contents in inputFormat recordReader and passing it to single map function.
  2. 2. Appending all contents in sequence file.
  3. PseudoCode:
  4. InputFormat RecordReader:
  5. @Override
  6. public boolean nextKeyValue() throws IOException, InterruptedException {
  7. if(flag>0)
  8. return false;
  9. flag++;
  10. String re=read all contents of file
  11. String key= k1;
  12. allRecords = new TextArrayWritable(Text.class, new Text[] {new Text(key),
  13. new Text(re)});
  14. return true;
  15. }
  16. @Override
  17. public TextArrayWritable getCurrentValue() throws IOException, InterruptedException {
  18. return allRecords;
  19. }
  20. Map Function:
  21. protected void map(Text key, TextArrayWritable value,
  22. Context context) throws IOException,
  23. InterruptedException {
  24. context.write(new Text(fileA path),value);
  25. }
  26. OutputFormat RecordWriter:
  27. @Override
  28. public void write(Text fileDir, TextArrayWritable contents) throws IOException,
  29. InterruptedException {
  30. SequenceFileWriter.append(contents.get()[0], contents.get()[1]);
  31. }

这两个操作都是内存中的操作,如果文件太大,可能会抛出内存不足错误。有没有办法避免将整个内容加载到内存中,并将其附加到序列文件中?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题