spark thriftserver缓存

72qzrwbm 于 2021-06-28 发布在 Hive

关注(0)|答案(0)|浏览(286)

该用户/会话从未运行过的“fresh”查询在第一次运行时似乎会导致一些缓存。具体来说，产生两个spark作业，第一个作业缓存数据（看起来在磁盘上），然后第二个作业实际执行sql。此外，缓存似乎会在一段时间后过期。
有没有办法优化这个过程，以便在配置单元元存储中注册表时可以进行缓存？缓存能否持久？
我们的工作流程：

//register the parquet files as hive tables
spark.catalog.createExternalTable("schema.table","s3a://some/table/", "parquet")

//query tables via jdbc, first execution takes minutes for caching
select column from schema.table where partition = value

//query tables via jdbc, second execution even with different values, is much faster
select column from schema.table where partition = another_value

Hive apache-spark apache-spark-sql

来源：https://stackoverflow.com/questions/40383456/spark-thriftserver-caching

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

spark thriftserver缓存

暂无答案！

相关问题

热门标签

最新问答