我有下面的spark数据集,我试图在按“id”列分组后只选取一行。如果只有一个记录,就保持原样。如果该键有2条记录,则使用以下条件进行拾取。如果有2条记录并且条件失败,请选择任何一条。
(released == true && hold_reason = "") or (released == false && hold_reason != "")
我不能继续前进,因为我是新的Spark数据集api。如果有人能给我指明前进的方向那就太好了。
输入数据:
+---+------+--------+-----------+
| id| nt_id|released|hold_reason|
+---+------+--------+-----------+
|id1|nt_id1| true| |
|id2|nt_id2| false| Blocked|
|id3|nt_id3| false| |
|id3|nt_id3| true| whitelist|
|id5|nt_id4| true| whitelist|
|id6|nt_id6| false| |
+---+------+--------+-----------+
输出数据:
+---+------+--------+-----------+
| id| nt_id|released|hold_reason|
+---+------+--------+-----------+
|id1|nt_id1| true| |
|id2|nt_id2| false| Blocked|
|id3|nt_id3| true | |
|id5|nt_id4| true| whitelist|
|id6|nt_id6| false| |
+---+------+--------+-----------+
val ds =
Seq(Person("id1","nt_id1",true, ""),
Person("id2","nt_id2",false, "Blocked"),
Person("id3","nt_id3",false, ""),
Person("id3","nt_id3",true, "whitelist"),
Person("id5","nt_id4",true, "whitelist"),
Person("id6","nt_id6",false, "")).toDS()
暂无答案!
目前还没有任何答案,快来回答吧!