何时在spark中执行刷新表my\u table？

w8f9ii69 于 2021-06-26 发布在 Hive

关注(0)|答案(1)|浏览(877)

考虑一个代码；

import org.apache.spark.sql.hive.orc._
 import org.apache.spark.sql._

 val path = ...
 val dataFrame:DataFramew = ...

 val hiveContext = new org.apache.spark.sql.hive.HiveContext(sparkContext)
 dataFrame.createOrReplaceTempView("my_table")
 val results = hiveContext.sql(s"select * from my_table")
 results.write.mode(SaveMode.Append).partitionBy("my_column").format("orc").save(path)
 hiveContext.sql("REFRESH TABLE my_table")

此代码使用相同的路径但不同的Dataframe执行两次。第一次运行成功，但随后出现错误：

Caused by: java.io.FileNotFoundException: File does not exist: hdfs://somepath/somefile.snappy.orc
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.

我试图清理缓存，调用 hiveContext.dropTempTable("tableName") 而且都没有效果。什么时候打电话 REFRESH TABLE tableName 之前、之后（其他变体）要修复这样的错误吗？

Hive apache-spark apache-spark-sql

来源：https://stackoverflow.com/questions/49234471/when-to-execute-refresh-table-my-table-in-spark

1条答案

按热度按时间

ippsafx71#

你可以跑了 spark.catalog.refreshTable(tableName) 或者 spark.sql(s"REFRESH TABLE $tableName") 就在写操作之前。我也有同样的问题，它解决了我的问题。

spark.catalog.refreshTable(tableName)
df.write.mode(SaveMode.Overwrite).insertInto(tableName)

赞(0）回复(0）举报 2021-06-26

我来回答

何时在spark中执行刷新表my\u table？

1条答案

相关问题

热门标签

最新问答