scala—如何根据sparkDataframe中另一列的值更改列的值

qybjjes1  于 2021-05-27  发布在  Spark
关注(0)|答案(2)|浏览(911)

我从这个Dataframe开始

  1. DF1
  2. +----+-------+-------+-------+
  3. |name | type |item1 | item2 |
  4. +-----+-------+------+-------+
  5. |apple|fruit |apple1|apple2 |
  6. |beans|vege |beans1|beans2 |
  7. |beef |meat |beef1 |beef2 |
  8. |kiwi |fruit |kiwi1 |kiwi2 |
  9. |pork |meat |pork1 |pork2 |
  10. +-----+-------+--------------+

现在我想根据df2中“type”列的列值填充一个名为“prop”的列。例如,

  1. If "type"== "fruit" then "prop"="item1"
  2. If "type"== "vege" then "prop"="item1"
  3. If "type"== "meat" then "prop"="item2"

最好的办法是什么?我在考虑根据每个“类型”进行过滤,填充“prop”列,然后连接生成的Dataframe。那似乎效率不高。

  1. DF2
  2. +----+-------+-------+-------+-------+
  3. |name | type |item1 | item2 | prop |
  4. +-----+-------+------+-------+-------+
  5. |apple|fruit |apple1|apple2 |apple1 |
  6. |beans|vege |beans1|beans2 |beans1 |
  7. |beef |meat |beef1 |beef2 |beef2 |
  8. |kiwi |fruit |kiwi1 |kiwi2 |kiwi1 |
  9. |pork |meat |pork1 |pork2 |pork2 |
  10. +-----+-------+--------------+-------+
nmpmafwu

nmpmafwu1#

使用 when+otherwise 这种情况下的声明是非常有效的Spark。

  1. //sample data
  2. df.show()
  3. //+-----+-----+------+------+
  4. //| name| type| item1| item2|
  5. //+-----+-----+------+------+
  6. //|apple|fruit|apple1|apple2|
  7. //|beans| vege|beans1|beans2|
  8. //| beef| meat| beef1| beef2|
  9. //| kiwi|fruit| kiwi1| kiwi2|
  10. //| pork| meat| pork1| pork2|
  11. //+-----+-----+------+------+
  12. //using isin function
  13. df.withColumn("prop",when((col("type").isin(Seq("vege","fruit"):_*)),col("item1")).when(col("type") === "meat",col("item2")).otherwise(col("type"))).show()
  14. df.withColumn("prop",when((col("type") === "fruit") ||(col("type") === "vege"),col("item1")).when(col("type") === "meat",col("item2")).
  15. otherwise(col("type"))).
  16. show()
  17. //+-----+-----+------+------+------+
  18. //| name| type| item1| item2| prop|
  19. //+-----+-----+------+------+------+
  20. //|apple|fruit|apple1|apple2|apple1|
  21. //|beans| vege|beans1|beans2|beans1|
  22. //| beef| meat| beef1| beef2| beef2|
  23. //| kiwi|fruit| kiwi1| kiwi2| kiwi1|
  24. //| pork| meat| pork1| pork2| pork2|
  25. //+-----+-----+------+------+------+
展开查看全部
qyyhg6bp

qyyhg6bp2#

它可以通过链接来完成 when 以及 otherwise 如下所示

  1. import org.apache.spark.sql.functions._
  2. object WhenThen {
  3. def main(args: Array[String]): Unit = {
  4. val spark = Constant.getSparkSess
  5. import spark.implicits._
  6. val df = List(("apple","fruit","apple1","apple2"),
  7. ("beans","vege","beans1","beans2"),
  8. ("beef","meat","beef1","beans2"),
  9. ("kiwi","fruit","kiwi1","beef2"),
  10. ("pork","meat","pork1","pork2")
  11. ).toDF("name","type","item1","item2" )
  12. df.withColumn("prop",
  13. when($"type" === "fruit", $"item1").otherwise(
  14. when($"type" === "vege", $"item1").otherwise(
  15. when($"type" === "meat", $"item2").otherwise("")
  16. )
  17. )).show()
  18. }
  19. }
展开查看全部

相关问题