如何使用spark编写以下sql查询 DataFrame API
?
val dt = spark.sql("select Subject, (count(Student)*100 / (select count(DISTINCT Student) from StudentTable)) as PercentPassed from StudentTable where Marks > 35 GROUP BY Subject").show(false)
在这里我想让学生通过的百分比(100%),在每个科目。
下面是注册为的Dataframe StudentTable
:
val data = Seq(
("S1","English",45),
("S1","Hindi",78),
("S1","Maths",67),
("S1","Science",23),
("S2","English",34),
("S2","Hindi",45),
("S2","Maths",88),
("S2","Science",65),
("S3","English",21),
("S3","Hindi",66),
("S3","Maths",63),
("S3","Science",44)
)
val df = spark.sparkContext.parallelize(data).toDF("Student","Subject","Marks")
df.createTempView("StudentTable")
1条答案
按热度按时间hkmswyz61#
你可以使用
groupBy
然后aggregate
有条件的count
```df.groupBy($"Subject")
.agg(
count(when($"Marks" > 35, 1)) /
count($"Marks") * 100 as "PercentPassed")
+-------+-----------------+
|Subject| PercentPassed|
+-------+-----------------+
|Science|66.66666666666666|
|English|33.33333333333333|
| Hindi| 100.0|
| Maths| 100.0|
+-------+-----------------+