如何在sparkscala中获得特定字段val?

dw1jzc5e  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(436)

人口普查(id:string,emptype:string,salary:int)非金属 MOXy (emptype:string,slab:int)
在联接这两个rdd之后,如何过滤薪水大于50000的值以及联接表中的其他字段(Spark(scala)

uqdfh47h

uqdfh47h1#

使用 filter 或者 where 从Dataframe中筛选数据。检查以下代码。

  1. scala> case class Census(id:String, emptype:String, salary:Int)
  2. defined class Census
  3. scala> case class Incometax(emptype:String,slab:Int)
  4. defined class Incometax
  5. scala> val censusDF = Seq(Census("1","a",10000),Census("2","b",20000),Census("3","c",60000)).toDF
  6. censusDF: org.apache.spark.sql.DataFrame = [id: string, emptype: string ... 1 more field]
  7. scala> val incometaxDF = Seq(Incometax("a",10),Incometax("b",15),Incometax("c",20)).toDF
  8. incometaxDF: org.apache.spark.sql.DataFrame = [emptype: string, slab: int]
  9. scala> censusDF.join(incometaxDF,Seq("emptype"),"left").filter(censusDF("salary") > 50000).show(false)
  10. +-------+---+------+----+
  11. |emptype|id |salary|slab|
  12. +-------+---+------+----+
  13. |c |3 |60000 |20 |
  14. +-------+---+------+----+
  15. scala> censusDF.join(incometaxDF,Seq("emptype"),"left").where(censusDF("salary") > 50000).show(false)
  16. +-------+---+------+----+
  17. |emptype|id |salary|slab|
  18. +-------+---+------+----+
  19. |c |3 |60000 |20 |
  20. +-------+---+------+----+
展开查看全部

相关问题