我有一个名为data的数据框,我正在使用www.example.com _csv将其保存为csv文件到我的datalake中pandas.to。但是,将文件保存为csv需要花费很多时间。有人能告诉我如何使用dbutils将csv文件保存到datalake中吗?另外,请确认创建目录的代码(如果不存在)是否正确
d = data.groupby(['Col1', 'Col2'])
for k, Dates in d:
if not Dates.empty:
PATH = /dbfs/mnt/data/../'
try:
dbutils.fs.ls(PATH)
pass
except Exception as e:
if 'java.io.FileNotFoundException' in str(e):
dbutils.fs.mkdirs(PATH)
Dates.to_csv(PATH+f'{Day}.csv',index=False)
1条答案
按热度按时间uemypmqf1#
In dbutils there is only coalesce and partition methods for saving files to csv and they will create files with Random names to create files in required names we use pandas to_csv method
Method 1
The “part-00000” is the CSV file
Download file to local and rename if required
Upload the csv file manually to datalake storage as follows
Method 2