如何在case类中将case对象用作字段并转换为spark数据集?

n3schb8v  于 2021-07-14  发布在  Spark
关注(0)|答案(1)|浏览(462)

我正在学习sparksql,并尝试在创建的数据集中应用过滤器。我定义了一个简单的employee case类,它有3个字段:name、salary和dpt。

case class Employee( name: String, salary: Double, age: Int, dpt: Dept)

最后一个字段dpt的定义如下:

sealed trait Dept extends { val name: String }

  case object Accountability extends Dept { override val name = "AC"}
  case object Sales extends Dept { override val name = "S"}
  case object Finance extends Dept { override val name = "F"}
  case object Marketing extends Dept { override val name = "M"}
  case object Communication extends Dept { override val name = "C"}
  case object Reception extends Dept { override val name = "R"}
  case object HumanResource extends Dept { override val name = "HR"}

我已经尝试使用kryo编码器来解决它,但它不工作。

object DeptEncoders {
    implicit def deptEncoder : org.apache.spark.sql.Encoder[Dept] = org.apache.spark.sql.Encoders.kryo[Dept]
  }
ax6ht2ek

ax6ht2ek1#

根据以下文件:

import org.apache.spark.sql.Encoder

...
// conf is your org.apache.spark.SparkConf used to create your Spark Context
conf.registerKryoClasses(Array(classOf[Dept], classOf[Employee]))

...
implicit val encoder1:Encoder[Dept] = org.apache.spark.sql.Encoders.kryo[Dept]
implicit val encoder2:Encoder[Employee] = org.apache.spark.sql.Encoders.kryo[Employee]

...
val df = Seq(e1, e2).toDF()

相关问题