spark scala—通过有条件地检查其他列的数量,向dataframe/data添加新列

toe95027  于 2021-07-13  发布在  Spark
关注(0)|答案(1)|浏览(376)

下面是需要转换为spark scala的遗留代码的场景。任何提示都将不胜感激。
场景:需要使用“withcolumn”向dataframe/dataset添加一个新的列,前提是有条件地检查20-22个其他列的值,然后派生这个新的列值。关于如何在spark scala中实现这一点,有什么建议吗?非常感谢。我尝试过使用udf并将22列的Map作为key:value and 如果不是用可变变量进行检查,但被本论坛的Maven告知不建议这样做,那么寻求什么是实现这一点的正确方法的指导?
或者使用dataset.mappartitions并在函数中使用可变变量是正确的方法?

val calculate = dataset.mapPartitions(partition => partition.map(x => {
      var value1 = "NA"
      var value1  = "NA"

 set the values of the mutable variables value1 and value2 based on the column values
 if ( x.fieldA ="xyx")
 {
    value1 = "ABC"
    value2 = "cbz
 }
 eles if (x.fieldA ="112" & x.fieldB ="xy1")
{
    value1 = "zya"
    value2 = "ab"
}

    df(
        x.fldC
    x.fldB
    value1
    value2
      )
    }

case class df(fldc:String,fldb:String,value1:String:value2:String)

你能让我知道什么其他细节,我应该提供,因为我已经更新了上述问题?
我对分布式/spark scala开发还不熟悉,所以可能会问一些基本问题。

vnzz0bqm

vnzz0bqm1#

import spark.implicits._

import org.apache.spark.sql.functions._

val sourceDF = Seq(
  (1,2,3,4,5,6,7,8,9,10, "11", "12", "13", "14", "15", "16", "17", "18", "19", true),
  (11,12,13,14,5,6,7,8,9,10, "11", "12", "13", "14", "15", "16", "17", "18", "19", false),
  (1,2,3,4,25,26,27,8,9,10, "11", "12", "13", "14", "15", "16", "17", "18", "19", true),
  (1,2,3,4,5,6,7,38,39,10, "11", "12", "13", "14", "15", "16", "17", "18", "19", false),
  (1,2,3,4,5,6,7,8,9,410, "11", "12", "13", "14", "15", "16", "17", "18", "19", true)
).toDF("col1", "col2", "col3", "col4", "col5", "col6", "col7", "col8", "col9", "col10",
  "col11", "col12", "col13", "col14", "col15", "col16", "col17", "col18", "col19", "col20")

val resDF = sourceDF
  .withColumn("col_result",
    when(
        'col1.equalTo(1) && 'col2.equalTo(2) && 'col3.equalTo(3) &&
        'col4.equalTo(4) && 'col5.equalTo(5) && 'col6.equalTo(6) &&
        'col7.equalTo(7) && 'col8.equalTo(8) && 'col9.equalTo(9) &&
        'col10.equalTo(10) && 'col11.equalTo("11") && 'col12.equalTo("12") &&
        'col13.equalTo("13") && 'col14.equalTo("14") && 'col15.equalTo("15") &&
        'col16.equalTo("16") && 'col17.equalTo("17") && 'col18.equalTo("18") &&
        'col19.equalTo("19") && 'col20.equalTo(true),"result").otherwise(null))

resDF.show(false)
//  +----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------+
//  |col1|col2|col3|col4|col5|col6|col7|col8|col9|col10|col11|col12|col13|col14|col15|col16|col17|col18|col19|col20|col_result|
//  +----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------+
//  |1   |2   |3   |4   |5   |6   |7   |8   |9   |10   |11   |12   |13   |14   |15   |16   |17   |18   |19   |true |result    |
//  |11  |12  |13  |14  |5   |6   |7   |8   |9   |10   |11   |12   |13   |14   |15   |16   |17   |18   |19   |false|null      |
//  |1   |2   |3   |4   |25  |26  |27  |8   |9   |10   |11   |12   |13   |14   |15   |16   |17   |18   |19   |true |null      |
//  |1   |2   |3   |4   |5   |6   |7   |38  |39  |10   |11   |12   |13   |14   |15   |16   |17   |18   |19   |false|null      |
//  |1   |2   |3   |4   |5   |6   |7   |8   |9   |410  |11   |12   |13   |14   |15   |16   |17   |18   |19   |true |null      |
//  +----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------+

变型2

val res1DF = sourceDF.withColumn("col_result_1",
  when(col("col8") === 38 || col("col20") === false, "good check" )
  .when(col("col10") === 410 && col("col17") === "17" && col("col20") === true, "next good check")
    .otherwise("we use when  many many")
)

res1DF.show(false)
//  +----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------------------+
//  |col1|col2|col3|col4|col5|col6|col7|col8|col9|col10|col11|col12|col13|col14|col15|col16|col17|col18|col19|col20|col_result_1          |
//  +----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------------------+
//  |1   |2   |3   |4   |5   |6   |7   |8   |9   |10   |11   |12   |13   |14   |15   |16   |17   |18   |19   |true |we use when  many many|
//  |11  |12  |13  |14  |5   |6   |7   |8   |9   |10   |11   |12   |13   |14   |15   |16   |17   |18   |19   |false|good check            |
//  |1   |2   |3   |4   |25  |26  |27  |8   |9   |10   |11   |12   |13   |14   |15   |16   |17   |18   |19   |true |we use when  many many|
//  |1   |2   |3   |4   |5   |6   |7   |38  |39  |10   |11   |12   |13   |14   |15   |16   |17   |18   |19   |false|good check            |
//  |1   |2   |3   |4   |5   |6   |7   |8   |9   |410  |11   |12   |13   |14   |15   |16   |17   |18   |19   |true |next good check       |
//  +----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------------------+

相关问题