在写入snowflake之前保存sparkDataframe

yquaqz18 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(463)

我在pyspark中工作，在得到一个最终的输出表之前，我做了一系列的转换并应用了用户定义的函数。最后一个写入snowflake的命令需要大约25分钟才能运行，因为它也在执行所有的计算，因为spark的计算很慢，直到最后一个调用才进行计算。我想在步骤之前对最终的表进行求值，这样我就可以计算所有转换所需的时间，然后分别计算写入雪花步骤所需的时间。我怎么把两者分开？我试过做：

temp = final_df.show() 

temp.write.format(SNOWFLAKE_SOURCE_NAME).options(**sfOptions2) \
.option("dbtable","TEST_SPARK").save()

但我有个错误：

'NoneType' object has no attribute 'write'

和collect（）

temp = final_df.collect() 

temp.write.format(SNOWFLAKE_SOURCE_NAME).options(**sfOptions2) \
.option("dbtable","TEST_SPARK").save()

但我有个错误：

'list' object has no attribute 'write'

apache-spark pyspark lazy-evaluation

来源：https://stackoverflow.com/questions/63137008/save-spark-dataframe-before-writing-to-snowflake

1条答案

按热度按时间

vaqhlq811#

你的 temp Dataframe的结果为 .show() 结果没有temp变量的类型，只有 dataframe 有 .write 方法到源。 Try with below code: ```
temp = final_df

view records from temp dataframe

temp.show()

temp.write.format(SNOWFLAKE_SOURCE_NAME).options(**sfOptions2)
.option("dbtable","TEST_SPARK").save()

collect collects the data as list and stores into temp variable

temp = final_df.collect()

list attributes doesn't have .write method

final_df.write.format(SNOWFLAKE_SOURCE_NAME).options(**sfOptions2)
.option("dbtable","TEST_SPARK").save()
`Update:`
import time
start_time = time.time()

code until show()

temp = final_df

view records from temp dataframe

temp.show()
end_time = time.time()
print("Total execution time for action: {} seconds".format(end_time - start_time))

start_time_sfw = time.time()

code until show()

final_df.write.format(SNOWFLAKE_SOURCE_NAME).options(**sfOptions2)
.option("dbtable","TEST_SPARK").save()
end_time_sfw = time.time()
print("Total execution time for writing to snowflake: {} seconds".format(end_time_sfw - start_time_sfw))

赞(0）回复(0）举报 2021-05-27

我来回答

在写入snowflake之前保存sparkDataframe

1条答案

view records from temp dataframe

collect collects the data as list and stores into temp variable

list attributes doesn't have .write method

code until show()

view records from temp dataframe

code until show()

相关问题

热门标签

最新问答