spark hive-udfargumenttypeexception with window函数？

c90pui9n 于 2021-06-28 发布在 Hive

关注(0)|答案(1)|浏览(501)

我有以下资料：

+------------+----------------------+-------------------+                                 
|increment_id|base_subtotal_incl_tax|          eventdate|                                 
+------------+----------------------+-------------------+                                 
|        1086|            14470.0000|2016-06-14 09:54:12|                                 
|        1086|            14470.0000|2016-06-14 09:54:12|                                 
|        1086|            14470.0000|2015-07-14 09:54:12|                                 
|        1086|            14470.0000|2015-07-14 09:54:12|                                 
|        1086|            14470.0000|2015-07-14 09:54:12|                                 
|        1086|            14470.0000|2015-07-14 09:54:12|                                 
|        1086|             1570.0000|2015-07-14 09:54:12|                                 
|        5555|            14470.0000|2014-07-14 09:54:12|                                 
|        5555|            14470.0000|2014-07-14 09:54:12|                                 
|        5555|            14470.0000|2014-07-14 09:54:12|                                 
|        5555|            14470.0000|2014-07-14 09:54:12|                                 
+------------+----------------------+-------------------+

我正在尝试以如下方式运行窗口函数：

WindowSpec window = Window.partitionBy(df.col("id")).orderBy(df.col("eventdate").desc());
df.select(df.col("*"),rank().over(window).alias("rank")) //error for this line
         .filter("rank <= 2")
         .show();

我想要得到的是每个用户的最后两个条目（最后一个是最新日期，但因为它是按降序排列的，所以是前两行）：

+------------+----------------------+-------------------+                                 
|increment_id|base_subtotal_incl_tax|          eventdate|                                 
+------------+----------------------+-------------------+                                 
|        1086|            14470.0000|2016-06-14 09:54:12|                                 
|        1086|            14470.0000|2016-06-14 09:54:12|   
|        5555|            14470.0000|2014-07-14 09:54:12|                                 
|        5555|            14470.0000|2014-07-14 09:54:12|                                     
+------------+----------------------+-------------------+

但我明白了：

+------------+----------------------+-------------------+----+
|increment_id|base_subtotal_incl_tax|          eventdate|rank|                            
+------------+----------------------+-------------------+----+                            
|        5555|            14470.0000|2014-07-14 09:54:12|   1|                            
|        5555|            14470.0000|2014-07-14 09:54:12|   1|                            
|        5555|            14470.0000|2014-07-14 09:54:12|   1|                            
|        5555|            14470.0000|2014-07-14 09:54:12|   1|                            
|        1086|            14470.0000|2016-06-14 09:54:12|   1|                            
|        1086|            14470.0000|2016-06-14 09:54:12|   1|                            
+------------+----------------------+-------------------+----+

我错过了什么？
[旧]-最初，我有一个错误，现在解决了：

WindowSpec window = Window.partitionBy(df.col("id"));
df.select(df.col("*"),rank().over(window).alias("rank")) //error for this line
         .filter("rank <= 2")
         .show();

但是，这将返回一个错误 Exception in thread "main" org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more arguments are expected. 对于上面标有注解的行。我错过了什么？这个错误是什么意思？谢谢您！

Java Hive apache-spark apache-spark-sql window-functions

来源：https://stackoverflow.com/questions/38475706/spark-hive-udfargumenttypeexception-with-window-function

1条答案

按热度按时间

q3qa4bjr1#

rank window函数需要具有 orderBy ，例如：

WindowSpec window = Window.partitionBy(df.col("id")).orderBy(df.col("payment"));

如果没有命令，它是毫无意义的，因此是错误的。

赞(0）回复(0）举报 2021-06-28

我来回答

spark hive-udfargumenttypeexception with window函数？

1条答案

相关问题

热门标签

最新问答