使用scala或python列出存储在hadoop hdfs上的spark集群中的所有可用文件？

5f0d552i 于 2021-06-03 发布在 Hadoop

关注(0)|答案(1)|浏览(352)

列出spark中本地可用的所有文件名的最有效方法是什么？我使用的是scalaapi，但是python应该也可以。

来源：https://stackoverflow.com/questions/23478377/listing-all-files-available-in-spark-cluster-stored-on-hadoop-hdfs-using-scala-o

1条答案

按热度按时间

rryofs0p1#

import org.apache.hadoop.fs.{FileSystem, FileUtil, Path}
import scala.collection.mutable.Stack

 val fs = FileSystem.get( sc.hadoopConfiguration )
 var dirs = Stack[String]()
 val files = scala.collection.mutable.ListBuffer.empty[String]
 val fs = FileSystem.get(sc.hadoopConfiguration)
 dirs.push("/user/username/")

 while(!dirs.isEmpty){
     val status = fs.listStatus(new Path(dirs.pop()))
     status.foreach(x=> if(x.isDirectory) dirs.push(x.getPath.toString) else 
     files+= x.getPath.toString)
 }

files.foreach(println)

赞(0）回复(0）举报 2021-06-03

我来回答

使用scala或python列出存储在hadoop hdfs上的spark集群中的所有可用文件？

1条答案

相关问题

热门标签

最新问答