阅读pyspark

hivapdat 于 2023-08-02 发布在 Spark

关注(0)|答案(2)|浏览(100)

在databricks notebook中，我正在创建一个源文件夹，其中包含年份和月份。

from datetime import datetime
now = datetime.now() # current date and time

year = now.strftime("%Y")
month = now.strftime("%m")

df = '"' + 'abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/' + year + '/' + month + '/"'
print(df)

字符串
我在打印**“abfss：//www.example.com“**container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/2023/07/
然而，当我试图通过pyspark从目标读取 Dataframe 时，我得到一个错误消息，我不知道是什么原因导致的。感谢你的帮助。谢啦，谢啦

DF = (
    spark
    .read
    .option("header", "true")
    .parquet(df)
    )

型
错误消息
IllegalArgumentException：java.net.URISyntaxException:索引0处的方案名称中存在非法字符：“abfss：container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2/2023/07/%22

pyspark

来源：https://stackoverflow.com/questions/76682153/reading-pyspark

2条答案

按热度按时间

vof42yt11#

弗洛克
连接字符串时，不需要添加引号。如果你把df开头和结尾的'"'去掉，你的代码就可以工作了。
我建议你使用f字符串连接。它可读性更强，更易于使用。

from datetime import datetime
now = datetime.now() # current date and time

year = now.strftime("%Y")
month = now.strftime("%m")
basepath = 'abfss://container@storageaccount.dfs.core.windows.net/bronze/folder1/folder2'

df = f'{basepath}/{year}/{month}/'
print(df)

字符串

赞(0）回复(0）举报 2023-08-02

zzoitvuj2#

您创建的路径无效。你得到的错误是说开头的引号是错误的。删除开头和结尾处的引号以修复它。

赞(0）回复(0）举报 2023-08-02