我试图在sparkDataframe中创建一个struct(col,col)数组,但出现了错误。使用样本数据得出相同的错误。
Dataframe
val df = Seq((1, "One", "uno", true), (2, "Two", "Dos", true), (3, "Three", "Tres", false)).toDF("number", "English", "Spanish", "include_spanish")
scala> df.show
+------+-------+-------+---------------+
|number|English|Spanish|include_spanish|
+------+-------+-------+---------------+
| 1| One| uno| true|
| 2| Two| Dos| true|
| 3| Three| Tres| false|
+------+-------+-------+---------------+
现在,我尝试用现有列创建struct,然后用它创建一个数组。
val df1 = df.withColumn("numberToEnglish", struct(col("number"), col("English"))).withColumn("numberToSpanish", struct("number", "Spanish")).withColumn("numberToLanguage", when(col("include_spanish") === true, array("numberToEnglish", "numberToSpanish")).otherwise(array("numberToEnglish"))
低于误差,
org.apache.spark.sql.AnalysisException: cannot resolve 'array(`numberToEnglish`, `numberToSpanish`)' due to data type mismatch: input to function array should all be the same type, but it's [struct<number:int,English:string>, struct<number:int,Spanish:string>];;
'Project [number#200, English#201, Spanish#202, include_spanish#203, numberToEnglish#253, numberToSpanish#259, CASE WHEN (include_spanish#203 = true) THEN array(numberToEnglish#253, numberToSpanish#259) ELSE array(numberToEnglish#253) END AS numberToLanguage#266]
实现此功能的最佳方法是什么?
1条答案
按热度按时间2wnc66cl1#
为了
array
要查看的方法struct($"number", $"English")
以及struct($"number", $"Spanish")
作为相同的数据类型,您需要命名struct元素,如下所示: