jvm Spark驱动程序的RMI库导致Full GC暂停(System.gc())

3qpi33ja  于 2022-11-07  发布在  Spark
关注(0)|答案(2)|浏览(196)

我们的Spark执行器日志包含以下内容:

org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval

我发现这些是从执行程序到驱动程序的心跳,我怀疑驱动程序上有GC问题,因此启用了GC日志记录,并发现了这些问题:

[Full GC (System.gc()) 5402.271: [CMS: 10188280K->8448710K(14849412K),27.2815605 secs] 10780958K->8448710K(15462852K), [Metaspace: 93432K->93432K(96256K)], 27.2833999 secs] [Times: user=27.28 sys=0.01, real=27.29 secs]

很明显,有些东西调用了System.gc(),导致了长时间的GC暂停,比如这个驱动程序(27秒)。进一步看,RMI是一个可疑的问题,因为这些System.gc()调用每30分钟发生一次。我在Spark驱动程序上找不到任何关于RMI的参考。我应该继续并通过设置-XX:+DisableExplicitGC来禁用System.gc()调用吗?

myss37ts

myss37ts1#

有趣的是,我正在研究一个类似的问题。我可以看到Spark中的一些代码实际上调用了System.gc()
可能值得在Spark打开一个JIRA来讨论这个问题。
我知道使用System.gc()进行调用不是最佳实践,主要是因为它将停止所有其他线程,这对性能有很大影响。但是,我可以在Java Oracle文档中看到,从Java 1.6开始引入了一个额外的JVM参数,以便并发地运行System.gc()(-XX:+ExplicitGCInvokesConcurrent):
http://docs.oracle.com/javase/6/docs/technotes/guides/vm/cms-6.html
您也许可以尝试将其设置为附加参数:
"-XX:+显式GC调用并发”
根据您设置参数的方式,您可以将其放在Spark的配置文件中,或者在spark命令(spark-submit、spark-shell等)中使用--conf行参数传递它。
更新:
在Spark 2.x的ContextCleaner.scala文件中找到以下注解:

/**

* How often to trigger a garbage collection in this JVM.
* 
* This context cleaner triggers cleanups only when weak references are  garbage collected.
* In long-running applications with large driver JVMs, where there is little memory pressure
* on the driver, this may happen very occasionally or not at all. Not  cleaning at all may
* lead to executors running out of disk space after a while.
* /
vnzz0bqm

vnzz0bqm2#

https://github.com/apache/spark/blob/branch-3.3/core/src/main/scala/org/apache/spark/ContextCleaner.scala#L92

/**
   * How often to trigger a garbage collection in this JVM.
   *
   * This context cleaner triggers cleanups only when weak references are garbage collected.
   * In long-running applications with large driver JVMs, where there is little memory pressure
   * on the driver, this may happen very occasionally or not at all. Not cleaning at all may
   * lead to executors running out of disk space after a while.
   */
  private val periodicGCInterval = sc.conf.get(CLEANER_PERIODIC_GC_INTERVAL)

和文档https://spark.apache.org/docs/latest/configuration.html#memory-management

spark.cleaner.periodicGC.interval   30min   Controls how often to trigger a garbage collection.

This context cleaner triggers cleanups only when weak references are garbage collected. In long-running applications with large driver JVMs, where there is little memory pressure on the driver, this may happen very occasionally or not at all. Not cleaning at all may lead to executors running out of disk space after a while.

相关问题