基于列值联接

falq053o  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(404)

我正在使用spark-sql-2.4.1v如何根据列的值进行各种连接
样本数据

val data = List(
  ("20", "score", "school",  14 ,12),
  ("21", "score", "school",  13 , 13),
  ("22", "rate", "school",  11 ,14)
 )
val df = data.toDF("id", "code", "entity", "value1","value2")

+---+-----+------+------+------+
| id| code|entity|value1|value2|
+---+-----+------+------+------+
| 20|score|school|    14|    12|
| 21|score|school|    13|    13|
| 22| rate|school|    11|    14|
| 21| rate|school|    13|    12|

基于“code”列值,我需要与其他各种表进行连接

val rateDs = // val data1= List(
  ("22", 11 ,A),
  ("22", 14 ,B),
  ("20", 13 ,C),
  ("21", 12 ,C),
  ("21", 13 ,D)
)

val df=data1.todf(“id”,“map\u code”,“map\u val”)

val scoreDs = // scoreTable

如果“code”列的值是“rate”,我需要与rateds联接如果“code”列的值是“score”,我需要与scoreds联接
如何在spark中处理这些事情?有什么最佳的方法来达到这个目的吗?
“rate”字段的预期结果

+---+-----+------+------+------+
| id| code|entity|value1|value2|
+---+-----+------+------+------+
| 22| rate|school|     A|    B |
| 21| rate|school|     D|    C |
5fjcxozz

5fjcxozz1#

例如,您可以简单地加入两次

val data = List(
  ("20", "score", "school",  14 , 12),
  ("21", "score", "school",  13 , 13),
  ("22", "rate", "school",  11 , 14),
  ("21", "rate", "school",  13 , 12)    
 )
val df = data.toDF("id", "code", "entity", "value1","value2")

val data1 = List(
  ("22", 11 ,"A"),
  ("22", 14 ,"B"),
  ("20", 13 ,"C"),
  ("21", 12 ,"C"),
  ("21", 13 ,"D")
)
val rateDF = data1.toDF("id", "map_code","map_val")

df.as("a")
  .join(rateDF.as("b"),
       col("a.code") === lit("rate") 
        && col("a.id") === col("b.id") 
        && col("a.value1") === col("b.map_code"), "inner")
  .join(rateDF.as("c"),
       col("a.code") === lit("rate") 
        && col("a.id") === col("c.id") 
        && col("a.value2") === col("c.map_code"), "inner")
  .select(col("a.id"), col("a.code"), col("a.entity"), col("b.map_val").as("value1"), col("c.map_val").as("value2"))
  .show(false)

+---+----+------+------+------+
|id |code|entity|value1|value2|
+---+----+------+------+------+
|22 |rate|school|A     |B     |
|21 |rate|school|D     |C     |
+---+----+------+------+------+

嗯,这看起来有点脏,但我不知道多栏。。。

相关问题