spark dataset.as函数为找不到的列引发异常- org.apache.spark.sql.AnalysisException: cannot resolve 'attr_3' given input columns: [attr_1, attr_2];
```
case class SomeCaseClass(attr_1: String, attr_2: Long, attr_3: String)
spark.read.parquet("some_directory").as[SomeCaseClass]
有没有什么方法可以避免这种异常,并为不存在的列设置null?
1条答案
按热度按时间pepwfjgg1#
读取时指定
schema
as schema为不存在的列添加空值,然后转换为DataSet
.Example:
```case class SomeCaseClass(attr_1: String, attr_2: Long, attr_3: String)
val sch=SeqSomeCaseClass.toDF.schema
spark.read.schema(SomeCaseClass).parquet("some_directory").as[SomeCaseClass]