pyspark 阅读最新的文件从一个文件夹中的数据湖在数据砖

9bfwbjaz 于 2023-11-16 发布在 Spark

关注(0)|答案(1)|浏览(185)

我有以下代码：

directory_path = "dbfs:/mnt/x_file_directory"
files = dbutils.fs.ls(directory_path)
latest_file = max(files, key=lambda f:f.modificationTime)
latest_file_path = latest_file.path
df = spark.read.option("header", "true").option("inferSchema", "true") \
  .csv(latest_file_path).toPandas()

字符串
错误信息：AnalysisException: [PATH_NOT_FOUND] Path does not exist: dbfs:/mnt/x_file_directory/file_name
文件路径看起来是正确的，所以我错在哪里？提前感谢！

pyspark

来源：https://stackoverflow.com/questions/77331162/reading-the-latest-file-from-a-folder-in-data-lake-in-databricks

1条答案

按热度按时间

suzh9iv81#

我尝试了以下方法从ADLS文件夹中阅读最新文件：
我使用以下代码挂载了我的ADLS容器：

dbutils.fs.mount(
    source="wasbs://<containerName>@<storageaccountName>.blob.core.windows.net/",
    mount_point="/mnt/<mountName>",
    extra_configs={
f"fs.azure.account.key.<storageaccountName>.blob.core.windows.net":"<Access-Key>"  
  }
)

字符串

的数据
我使用下面的代码从data lake文件夹中读取了最新的文件：

directory_path = "dbfs:/mnt/files/input"
files = dbutils.fs.ls(directory_path)
latest_file = max(files, key=lambda f:f.modificationTime)
latest_file_path = latest_file.path
df = spark.read.option("header", "true").option("inferSchema", "true") \
      .csv(latest_file_path).toPandas()
print(df)

型

的
根据this，dbfs:/mnt/中的文件夹实际上并不是挂载的卷，而只是简单的文件夹。这就是为什么我使用display(dbutils.fs.mounts())检查挂载点的位置，发现它位于存储帐户中，如下所述：

型
请检查您的装载点是否位于存储帐户中。

展开查看全部

赞(0）回复(0）举报 2023-11-16

我来回答

pyspark 阅读最新的文件从一个文件夹中的数据湖在数据砖

1条答案

相关问题

热门标签

最新问答