spark-stack从列表达式数组中选择多个条件

vs91vp4v  于 2021-07-13  发布在  Spark
关注(0)|答案(2)|浏览(365)

我有以下sparkDataframe:

val df = Seq(("US",10),("IND",20),("NZ",30),("CAN",40)).toDF("a","b")
df.show(false)
+---+---+
|a  |b  |
+---+---+
|US |10 |
|IND|20 |
|NZ |30 |
|CAN|40 |
+---+---+

我正在申请 when() 条件如下:

df.withColumn("x", when(col("a").isin(us_list:_*),"u").when(col("a").isin(i_list:_*),"i").when(col("a").isin(n_list:_*),"n").otherwise("-")).show(false)

+---+---+---+
|a  |b  |x  |
+---+---+---+
|US |10 |u  |
|IND|20 |i  |
|NZ |30 |n  |
|CAN|40 |-  |
+---+---+---+

现在,为了尽量减少代码,我尝试以下方法:

val us_list = Array("U","US")
val i_list = Array("I","IND")
val n_list = Array("N","NZ")
val ar1 = Array((us_list,"u"),(i_list,"i"),(n_list,"n"))

val ap = ar1.map( x => when(col("a").isInCollection(x._1),x._2) )

这会导致

ap: Array[org.apache.spark.sql.Column] = Array(CASE WHEN (a IN (U, US)) THEN u END, CASE WHEN (a IN (I, IND)) THEN i END, CASE WHEN (a IN (N, NZ)) THEN n END)

但当我尝试

val ap = ar1.map( x => when(col("a").isInCollection(x._1),x._2) ).reduce( (x,y) => x.y )

我出错了。如何解决这个问题?

ajsxfq5m

ajsxfq5m1#

您可以在上使用foldleft ar1 列表:

val x = ar1.foldLeft(lit("-")) { case (acc, (list, value)) =>
  when(col("a").isin(list: _*), value).otherwise(acc)
}

// x: org.apache.spark.sql.Column = CASE WHEN (a IN (N, NZ)) THEN n ELSE CASE WHEN (a IN (I, IND)) THEN i ELSE CASE WHEN (a IN (U, US)) THEN u ELSE - END END END
yptwkmov

yptwkmov2#

通常不需要合并 when 语句使用 reduce / fold 等。 coalesce 足够了,因为 when 语句按顺序求值,并给出 null 当条件为假时。它还可以避免您指定 otherwise 因为你可以在参数列表中再加一列 coalesce .

val ar1 = Array((us_list,"u"),(i_list,"i"),(n_list,"n"))
val ap = ar1.map( x => when(col("a").isInCollection(x._1),x._2) )
val combined = coalesce(ap :+ lit("-"): _*)

df.withColumn("x", combined).show
+---+---+---+
|  a|  b|  x|
+---+---+---+
| US| 10|  u|
|IND| 20|  i|
| NZ| 30|  n|
|CAN| 40|  -|
+---+---+---+

相关问题