我正在使用spark-sql-2.4.1v如何根据列的值进行各种连接
样本数据
val data = List(
("20", "score", "school", 14 ,12),
("21", "score", "school", 13 , 13),
("22", "rate", "school", 11 ,14)
)
val df = data.toDF("id", "code", "entity", "value1","value2")
+---+-----+------+------+------+
| id| code|entity|value1|value2|
+---+-----+------+------+------+
| 20|score|school| 14| 12|
| 21|score|school| 13| 13|
| 22| rate|school| 11| 14|
| 21| rate|school| 13| 12|
基于“code”列值,我需要与其他各种表进行连接
val rateDs = // val data1= List(
("22", 11 ,A),
("22", 14 ,B),
("20", 13 ,C),
("21", 12 ,C),
("21", 13 ,D)
)
val df=data1.todf(“id”,“map\u code”,“map\u val”)
val scoreDs = // scoreTable
如果“code”列的值是“rate”,我需要与rateds联接如果“code”列的值是“score”,我需要与scoreds联接
如何在spark中处理这些事情?有什么最佳的方法来达到这个目的吗?
“rate”字段的预期结果
+---+-----+------+------+------+
| id| code|entity|value1|value2|
+---+-----+------+------+------+
| 22| rate|school| A| B |
| 21| rate|school| D| C |
1条答案
按热度按时间5fjcxozz1#
例如,您可以简单地加入两次
嗯,这看起来有点脏,但我不知道多栏。。。