正在尝试读取databricks community edition集群中的delta日志文件(databricks-7.2版本)
df=spark.range(100).toDF("id")
df.show()
df.repartition(1).write.mode("append").format("delta").save("/user/delta_test")
with open('/user/delta_test/_delta_log/00000000000000000000.json','r') as f:
for l in f:
print(l)
``` `Getting file not found error:` ```
FileNotFoundError: [Errno 2] No such file or directory: '/user/delta_test/_delta_log/00000000000000000000.json'
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<command-1759925981994211> in <module>
----> 1 with open('/user/delta_test/_delta_log/00000000000000000000.json','r') as f:
2 for l in f:
3 print(l)
FileNotFoundError: [Errno 2] No such file or directory: '/user/delta_test/_delta_log/00000000000000000000.json'
我试过添加 /dbfs/
, dbfs:/
什么都没有解决,仍然得到同样的错误。
with open('/dbfs/user/delta_test/_delta_log/00000000000000000000.json','r') as f:
for l in f:
print(l)
但是使用 dbutils.fs.head
我能读懂文件。
dbutils.fs.head("/user/delta_test/_delta_log/00000000000000000000.json")
'{"commitInfo":{"timestamp":1598224183331,"userId":"284520831744638","userName":"","operation":"WRITE","operationParameters":{"mode":"Append","partitionBy":"[]"},"notebook":{"","isolationLevel":"WriteSerializable","isBlindAppend":true,"operationMetrics":{"numFiles":"1","numOutputBytes":"1171","numOutputRows":"100"}}}\n{"protocol":{"minReaderVersi...etc
我们怎样才能阅读/cat a dbfs file
在数据库里 python open method
?
1条答案
按热度按时间mhd8tkvw1#
默认情况下,这些数据位于dbfs上,您的代码需要了解如何访问它。python对此一无所知,这就是它失败的原因。
但是有一个解决方法-dbfs被挂载到
/dbfs
,所以您只需要将其附加到文件名:而不是/user/delta_test/_delta_log/00000000000000000000.json
,使用/dbfs/user/delta_test/_delta_log/00000000000000000000.json
更新:在CommunityEdition上,在DBR7+中,此装载被禁用。解决方法是使用dbutils.fs.cp
命令将文件从dbfs复制到本地目录,例如,/tmp
,或/var/tmp
,然后从中读出。