当我试图创建一个类的对象并调用特定的方法时,我不断得到以下错误堆栈跟踪 newRDD
以及 blah
```
I create a spark shell by importing the jar and run the following in spark-shell
spark-shell --master=yarn --jars=sample_jar.jar --files database.cfg
scala> val reader = new Sample(spark)
scala> val a = reader.buildFileRDD("/xyz/path")
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2294)
at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:387)
at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:386)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.filter(RDD.scala:386)
at Sample.newRDDscala(Sample.scala:117)
... 48 elided
Caused by: java.io.NotSerializableException:
如何解决此错误?
2条答案
按热度按时间blmhpbnm1#
从堆栈跟踪看来,您使用的是
DatabaseUtils
内部关闭,因为DatabaseUtils
无法序列化无法通过n/w转换,请尝试序列化DatabaseUtils
. 而且,你可以DatabaseUtils
scala object
```.. DatabaseUtils extends Serializable
vbopmzt12#
改变
DatabaseUtils
下面的代码&类内示例移除变量dbconfig&url,添加以下内容val dbObj = new DatabaseUtils(ConfigFactory.parseFile(new File(config)))
```class DatabaseUtils(url: String, username: String, password: String) {
val driver = "com.mysql.jdbc.Driver"
def executeSelectQuery(qry: String): List[String] = {
}
}
object DatabaseUtils {
def apply(dbConfig:Config): DatabaseUtils = {
val url = "jdbc:mysql://" + dbConfig.getString("db.host") +":"+ dbConfig.getString("db.port") + "/" + dbConfig.getString("db.database") + "?useSSL=false"
new DatabaseUtils(url, dbConfig.getString("db.username") ,dbConfig.getString("db.password"))
}
}