我正在尝试实现以下解决方案:窗口函数
我有以下资料:
+------------+----------------------+-------------------+
|increment_id|base_subtotal_incl_tax| eventdate|
+------------+----------------------+-------------------+
| 1086| 14470.0000|2016-06-14 09:54:12|
| 1086| 14470.0000|2016-06-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 1570.0000|2015-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
+------------+----------------------+-------------------+
我正在尝试以如下方式运行窗口函数:
WindowSpec window = Window.partitionBy(df.col("id")).orderBy(df.col("eventdate").desc());
df.select(df.col("*"),rank().over(window).alias("rank")) //error for this line
.filter("rank <= 2")
.show();
我想要的是为每个用户获取最后两个条目(最后一个是最新日期,但由于它是按日期降序排列的,所以前两行是):
+------------+----------------------+-------------------+
|increment_id|base_subtotal_incl_tax| eventdate|
+------------+----------------------+-------------------+
| 1086| 14470.0000|2016-06-14 09:54:12|
| 1086| 14470.0000|2016-06-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
+------------+----------------------+-------------------+
但我明白了:
+------------+----------------------+-------------------+----+
|increment_id|base_subtotal_incl_tax| eventdate|rank|
+------------+----------------------+-------------------+----+
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 1086| 14470.0000|2016-06-14 09:54:12| 1|
| 1086| 14470.0000|2016-06-14 09:54:12| 1|
+------------+----------------------+-------------------+----+
我错过了什么?
1条答案
按热度按时间mkh04yzy1#
所有值都相等->等级相等。尝试
row_number
: