从azure dataricks将Dataframe写入/保存到azure文件共享

ddrv8njm  于 2021-05-26  发布在  Spark
关注(0)|答案(1)|浏览(576)

如何从azure databricks spark作业写入azure文件共享。
我配置了hadoop存储键和值。

spark.sparkContext.hadoopConfiguration.set(
  "fs.azure.account.key.STORAGEKEY.file.core.windows.net",
  "SECRETVALUE"
)

val wasbFileShare =
    s"wasbs://testfileshare@STORAGEKEY.file.core.windows.net/testPath"

df.coalesce(1).write.mode("overwrite").csv(wasbBlob)

当尝试将Dataframe保存到azure文件共享时,我看到以下错误:虽然uri存在,但资源未找到。

Exception in thread "main" org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: The requested URI does not represent any resource on the server.
smdncfj3

smdncfj31#

不幸的是,azure databricks不支持读取和写入azure文件共享。
azure databricks支持的数据源:https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/
我建议你提供同样的反馈:
https://feedback.azure.com/forums/909463-azure-databricks
您在这些论坛中共享的所有反馈都将由负责构建azure的microsoft工程团队进行监视和审核。
您可以 checkout 解决类似问题的线程:databricks和azure文件
下面是将csv数据直接写入azuredatabricks笔记本中的azureblob存储容器的代码段。


# Configure blob storage account access key globally

spark.conf.set("fs.azure.account.key.chepra.blob.core.windows.net", "gv7nVIXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXdlOiA==")
output_container_path = "wasbs://sampledata@chepra.blob.core.windows.net"
output_blob_folder = "%s/wrangled_data_folder" % output_container_path

# write the dataframe as a single file to blob storage

(dataframe
 .coalesce(1)
 .write
 .mode("overwrite")
 .option("header", "true")
 .format("com.databricks.spark.csv")
 .save(output_blob_folder))

# Get the name of the wrangled-data CSV file that was just saved to Azure blob storage (it starts with 'part-')

files = dbutils.fs.ls(output_blob_folder)
output_file = [x for x in files if x.name.startswith("part-")]

# Move the wrangled-data CSV file from a sub-folder (wrangled_data_folder) to the root of the blob container

# While simultaneously changing the file name

dbutils.fs.mv(output_file[0].path, "%s/predict-transform-output.csv" % output_container_path)

相关问题