在spark scala中用if连接多个列

gmol1639  于 2021-05-27  发布在  Spark
关注(0)|答案(2)|浏览(511)

我必须用spark scala中的一个条件连接多个列,但它不能与“if”一起工作。我有以下Dataframe:

table:
+---+----+----+  
|  a|   b|   c|
+---+----+----+  
|  0|   1|   1|
|  1|   0|   0|
|  1|   1|   0|
+---+----+----+

table.withColumn("concat", concat_ws(", ", (if($"a"===1){lit("D")} else{null}),
                                           (if($"b"===1){lit("E")} else{null}),
                                           (if($"c"===1){lit("F")} else{null})))

以下是最终要求的结果。

+---+----+----+------+ 
|  a|   b|   c|concat|
+---+----+----+------+  
|  0|   1|   1|  E, F|
|  1|   0|   0|     D|
|  1|   1|   0|  D, E|
+---+----+----+------+

我不会像这样创建其他列:

val ftable = (table.withColumn("D", when ($"a"===1, lit("D")))
                    .withColumn("E", when ($"b"===1, lit("E")))
                    .withColumn("F", when ($"c"===1, lit("F"))))                                 

val columnselection = ftable.select($"D", $"E" , $"F" )
val selection = columnselection.columns.map(col)
val animaliCol = ftable.select(ftable.col("*"), concat_ws(", ", selection : _*).as("concat"))
z9ju0rcb

z9ju0rcb1#

使用 concat_ws , when & otherwise 功能。

scala> df.show(false)
+---+---+---+
|a  |b  |c  |
+---+---+---+
|0  |1  |1  |
|1  |0  |0  |
|1  |1  |0  |
+---+---+---+

创建所需列及其默认值。

scala> val expressions = concat_ws(", ",Seq(("a","D"),("b","E"),("c","F")).map(c => when(col(c._1) === 1, lit(c._2)).otherwise(null)):_*) // Create column names & its default values expressions.
expressions: org.apache.spark.sql.Column = concat_ws(, , CASE WHEN (a = 1) THEN D ELSE NULL END, CASE WHEN (b = 1) THEN E ELSE NULL END, CASE WHEN (c = 1) THEN F ELSE NULL END)

最终输出

scala> df.withColumn("concat",expressions).show(false)
+---+---+---+------+
|a  |b  |c  |concat|
+---+---+---+------+
|0  |1  |1  |E, F  |
|1  |0  |0  |D     |
|1  |1  |0  |D, E  |
+---+---+---+------+
z9gpfhce

z9gpfhce2#

你应该替换 ifwhen 以及 otherwise .

import org.apache.spark.sql.functions._

 table.withColumn("concat", concat_ws(", ",
      when($"a"===1,lit("D")).otherwise(null),
      when($"b"===1,lit("E")).otherwise(null),
      when($"c"===1,lit("F")).otherwise(null)
    )).show()

输出:

+---+---+---+------+
|  a|  b|  c|concat|
+---+---+---+------+
|  0|  1|  1|  E, F|
|  1|  0|  0|     D|
|  1|  1|  0|  D, E|
+---+---+---+------+

相关问题