val data = Seq(
("India","Pakistan","India"),
("Australia","India","India"),
("New Zealand","Zimbabwe","New Zealand"),
("West Indies", "Bangladesh","Bangladesh"),
("Sri Lanka","Bangladesh","Bangladesh"),
("Sri Lanka","Bangladesh","Bangladesh"),
("Sri Lanka","Bangladesh","Bangladesh")
)
val df = data.toDF("Team_1","Team_2","Winner")
我有这个数据框。我想数一数每队打了几场比赛?
3条答案
按热度按时间up9lanfz1#
可以使用union with select语句,也可以使用org.apache.spark.sql.functions.array中的数组
使用
select
声明和union
:使用
array
:性能方面的使用
org.apache.spark.sql.functions.array
这样更好。knsnq2tg2#
以上答案讨论了三种方法,我试着评估(只是为了教育/意识)在绩效方面花费的时间。。。。
结果:
我注意到了
org.apache.spark.sql.functions.array
该方法所用的时间(646513918纳秒)比union
接近。。。gkl3eglg3#