忽略dataset.as[somecaseclass]的不存在列

g6ll5ycj  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(240)

spark dataset.as函数为找不到的列引发异常- org.apache.spark.sql.AnalysisException: cannot resolve 'attr_3' given input columns: [attr_1, attr_2]; ```
case class SomeCaseClass(attr_1: String, attr_2: Long, attr_3: String)

spark.read.parquet("some_directory").as[SomeCaseClass]

有没有什么方法可以避免这种异常,并为不存在的列设置null?
pepwfjgg

pepwfjgg1#

读取时指定 schema as schema为不存在的列添加空值,然后转换为 DataSet . Example: ```
case class SomeCaseClass(attr_1: String, attr_2: Long, attr_3: String)

val sch=SeqSomeCaseClass.toDF.schema

spark.read.schema(SomeCaseClass).parquet("some_directory").as[SomeCaseClass]

相关问题