考虑一个代码;
import org.apache.spark.sql.hive.orc._
import org.apache.spark.sql._
val path = ...
val dataFrame:DataFramew = ...
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sparkContext)
dataFrame.createOrReplaceTempView("my_table")
val results = hiveContext.sql(s"select * from my_table")
results.write.mode(SaveMode.Append).partitionBy("my_column").format("orc").save(path)
hiveContext.sql("REFRESH TABLE my_table")
此代码使用相同的路径但不同的Dataframe执行两次。第一次运行成功,但随后出现错误:
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://somepath/somefile.snappy.orc
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
我试图清理缓存,调用 hiveContext.dropTempTable("tableName")
而且都没有效果。什么时候打电话 REFRESH TABLE tableName
之前、之后(其他变体)要修复这样的错误吗?
1条答案
按热度按时间ippsafx71#
你可以跑了
spark.catalog.refreshTable(tableName)
或者spark.sql(s"REFRESH TABLE $tableName")
就在写操作之前。我也有同样的问题,它解决了我的问题。