java—在mapper中逐行从hdfs读取文本文件

p1tboqfb 于 2021-06-04 发布在 Hadoop

关注(0)|答案(2)|浏览(359)

下面的Map程序代码从hdfs读取文本文件对吗？如果是：
如果不同节点中的两个Map器几乎同时打开文件，会发生什么情况？
难道不需要关闭 InputStreamReader ? 如果是这样的话，如何在不关闭文件系统的情况下做到这一点？
我的代码是：

Path pt=new Path("hdfs://pathTofile");
FileSystem fs = FileSystem.get(context.getConfiguration());
BufferedReader br=new BufferedReader(new InputStreamReader(fs.open(pt)));
String line;
line=br.readLine();
while (line != null){
System.out.println(line);

Java hadoop hdfs

来源：https://stackoverflow.com/questions/14573209/read-a-text-file-from-hdfs-line-by-line-in-mapper

2条答案

按热度按时间

1l5u6lss1#

这将起作用，但需要做一些修改-我假设您粘贴的代码只是被截断了：

Path pt=new Path("hdfs://pathTofile");
FileSystem fs = FileSystem.get(context.getConfiguration());
BufferedReader br=new BufferedReader(new InputStreamReader(fs.open(pt)));
try {
  String line;
  line=br.readLine();
  while (line != null){
    System.out.println(line);

    // be sure to read the next line otherwise you'll get an infinite loop
    line = br.readLine();
  }
} finally {
  // you should close out the BufferedReader
  br.close();
}

可以让多个Map器读取同一个文件，但使用分布式缓存的意义有限（不仅可以减少承载文件块的数据节点上的负载，而且如果作业的任务数大于任务节点数，效率也会更高）

赞(0）回复(0）举报 2021-06-04

qgelzfjb2#

import java.io.{BufferedReader, InputStreamReader}

def load_file(path:String)={
    val pt=new Path(path)
    val fs = FileSystem.get(new Configuration())
    val br=new BufferedReader(new InputStreamReader(fs.open(pt)))
    var res:List[String]=  List()
    try {

      var line=br.readLine()
      while (line != null){
        System.out.println(line);

        res= res :+ line
        line=br.readLine()
      }
    } finally {
      // you should close out the BufferedReader
      br.close();
    }

    res
  }

赞(0）回复(0）举报 2021-06-04

我来回答

java—在mapper中逐行从hdfs读取文本文件

2条答案

相关问题

热门标签

最新问答