在spark数据集创建中使用java域对象代替scala case类

jdgnovmf  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(429)

我正在尝试使用rdd#tods方法从rdd创建spark数据集。
但是,我希望使用第三方库中定义的现有域对象,而不是使用scala case类来指定模式。但是,当我这样做时,我会得到以下错误:

scala> import org.hl7.fhir.dstu3.model.Patient
import org.hl7.fhir.dstu3.model.Patient

scala> val patients = sc.loadFromMongoDB(ReadConfig(Map("uri" -> "mongodb://mongodb/fhir.patients")))
patients: com.mongodb.spark.rdd.MongoRDD[org.bson.Document] = MongoRDD[0] at RDD at MongoRDD.scala:47

scala> val patientsDataSet = patients.toDS[Patient]()
<console>:44: error: not enough arguments for method toDS: (beanClass: Class[org.hl7.fhir.dstu3.model.Patient])org.apache.spark.sql.Dataset[org.hl7.fhir.dstu3.model.Patient].
Unspecified value parameter beanClass.
         val patientsDataSet = patients.toDS[Patient]()
                                                     ^

这是我去掉括号后得到的结果:

scala> val patientsDataSet = patients.toDS[Patient]
<console>:46: error: missing arguments for method toDS in class MongoRDD;
follow this method with `_' if you want to treat it as a partially applied function
         val patientsDataSet = patients.toDS[Patient]

我可以用java对象代替case类吗?
谢谢!

ocebsuys

ocebsuys1#

创建扩展java对象的case类可能会奏效。
java 语:

public class Patient {

  private final String name;
  private final String status;

  public Patient(String name, String status) {
    this.name = name;
    this.status = status;
  }

  public String getName() {
    return name;
  }

  public String getStatus() {
    return status;
  }

}

斯卡拉:

case class Patient0(name: String, status: String) extends Patient(name, status)
val patientsDataSet = patients.toDS[Patient]()
val patients = sc.loadFromMongoDB(ReadConfig(Map("uri" -> "mongodb://mongodb/fhir.patients")))

相关问题