spark-从hdfs读取隐藏文件

dzhpxtsq 于 2021-07-09 发布在 Spark

关注(0)|答案(2)|浏览(618)

我正在与PySparkShell一起分析hdfs中的数据。hdfs路径中有隐藏的文件，我想通过shell读取它们。但是点文件被spark忽略。我怎么看？


# This is not loading hidden files into data-frame
dir="/abc/xyz"
df=spark.read.text(dir)
# This is not loading hidden files into data-frame
dir="/abc/xyz/*"
df=spark.read.text(dir)
# This is not loading hidden files into data-frame
dir="/abc/xyz/.*"
df=spark.read.text(dir)

如有任何建议，将不胜感激。

hdfs apache-spark pyspark

来源：https://stackoverflow.com/questions/66860024/spark-read-hidden-files-from-hdfs

2条答案

按热度按时间

hjzp0vay1#

spark使用HadoopAPI从hdfs读入数据。hadoop输入格式具有路径筛选器，可以筛选出从“\”和“.”开始的文件。请尝试在配置中设置此属性fileinputformat.setinputpathfilter，然后使用newapihadoopfile创建rdd

赞(0）回复(0）举报 2021-07-09

deikduxw2#

试着改变你的道路。


# This is not loading hidden files into data-frame
    # dir="/abc/xyz/.*"
    dir = "hdfs://yourhost:yourport/abc/xyz/"
    df=spark.read.text(dir)

赞(0）回复(0）举报 2021-07-09

我来回答

spark-从hdfs读取隐藏文件

2条答案

相关问题

热门标签

最新问答