SparkDataframe不持久也会很诱人吗？

i86rm4rw 于 2021-05-27 发布在 Spark

关注(0)|答案(0)|浏览(170)

我正在试图修复spark scala应用程序中的堆内存问题，并试图了解其泄漏的内存资源的位置。我正在将数据加载到spark DataFrame中，并在其上有临时表。当我取消持久化Dataframe时，默认情况下表也会被清除吗？我试图在一个测试程序中模拟这种情况，但行为并不一致

val data = Seq(Row("1","2020-05-11 15:17:57.188","2020"))
    val schemaOrig = List( StructField("key",StringType,true)
                          ,StructField("txn_ts",StringType,true)
                          ,StructField("txn_dt",StringType,true))

    val sourceDf =  spark.createDataFrame(spark.sparkContext.parallelize(data),StructType(schemaOrig))
    sourceDf.createOrReplaceTempView("sourceTable")

    sourceDf.unpersist()

现在，如果我尝试查询temp表并期望它重新计算整个rdd沿袭。

spark.sql("select * from sourceTable").show

    [Stage 0:>                                                          (0 + 0) / 
    +---+--------------------+------+
    |key|              txn_ts|txn_dt|
    +---+--------------------+------+
    |  1|2020-05-11 15:17:...|  2020|
    +---+--------------------+------+

现在我做一个dataframe.show，看看是否再次计算

scala> sourceDf.show
[Stage 2:>                                                          (0 + 0) / 1]

+---+--------------------+------+
|key|              txn_ts|txn_dt|
+---+--------------------+------+
|  1|2020-05-11 15:17:...|  2020|
+---+--------------------+------+

现在正在尝试解除Dataframe的持久性并立即调用show。因为unpersist被称为它应该重新计算，但看起来不是。所以问题是1）unpersist会立即清除记忆吗？2）临时表和Dataframe内存位置相同还是完全独立？

scala> sourceDf.unpersist
res5: sourceDf.type = [key: string, txn_ts: string ... 1 more field]

scala> sourceDf.show
+---+--------------------+------+
|key|              txn_ts|txn_dt|
+---+--------------------+------+
|  1|2020-05-11 15:17:...|  2020|
+---+--------------------+------+

apache-spark apache-spark-sql apache-spark-dataset

来源：https://stackoverflow.com/questions/63119631/spark-dataframe-unpersist-will-it-clear-temptable-also

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

SparkDataframe不持久也会很诱人吗？

暂无答案！

相关问题

热门标签

最新问答