spark:按键分组并根据条件过滤记录

3phpmpom  于 2021-05-29  发布在  Spark
关注(0)|答案(0)|浏览(284)

我有下面的spark数据集,我试图在按“id”列分组后只选取一行。如果只有一个记录,就保持原样。如果该键有2条记录,则使用以下条件进行拾取。如果有2条记录并且条件失败,请选择任何一条。

(released == true && hold_reason = "") or (released == false && hold_reason != "")

我不能继续前进,因为我是新的Spark数据集api。如果有人能给我指明前进的方向那就太好了。
输入数据:

+---+------+--------+-----------+
| id| nt_id|released|hold_reason|
+---+------+--------+-----------+
|id1|nt_id1|    true|           |
|id2|nt_id2|   false|    Blocked|
|id3|nt_id3|   false|           |
|id3|nt_id3|    true|  whitelist|
|id5|nt_id4|    true|  whitelist|
|id6|nt_id6|   false|           |
+---+------+--------+-----------+

输出数据:

+---+------+--------+-----------+
| id| nt_id|released|hold_reason|
+---+------+--------+-----------+
|id1|nt_id1|    true|           |
|id2|nt_id2|   false|    Blocked|
|id3|nt_id3|   true |           |
|id5|nt_id4|    true|  whitelist|
|id6|nt_id6|   false|           |
+---+------+--------+-----------+
val ds = 
Seq(Person("id1","nt_id1",true, ""),
Person("id2","nt_id2",false, "Blocked"),
Person("id3","nt_id3",false, ""), 
Person("id3","nt_id3",true, "whitelist"), 
Person("id5","nt_id4",true, "whitelist"), 
Person("id6","nt_id6",false, "")).toDS()

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题