我有一张table,每一排都有一笔交易:
sale_date salesman sale_item_id
20170102 JohnSmith 309
20170102 JohnSmith 292
20170103 AlexHam 93
我每天都想找到前20名推销员,我想到了这个:
SELECT sale_date, salesman, sale_count, row_num
FROM (
SELECT sale_date, salesman,
count(*) as sale_count,
rank() over (partition by sale_date order by sale_count desc) as row_num
from salesforce.sales_data
) T
WHERE sale_date between '20170101' and '20170110'
and row_num <= 20
但我得到:
FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.
Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: Line 5:35 Expression not in GROUP BY key 'sale_date'
不过,我不确定分组会在什么时候生效。有人能帮忙吗?德克萨斯州!
2条答案
按热度按时间jogvjijk1#
试试这个
编辑和测试。您的问题本质上是,您试图在计算over子句之前使用count,如果您在子查询pariting by sallers中计算count,它将解决问题。您不能在销售查询中执行分组依据,如果执行,您将无权访问销售日期。
1tuwyuhd2#
你错过了一个
group by
在子查询中:我认为hive将接受
order by
,order by sale_count desc
.另外请注意,如果有关系,则可以获得多于或少于20行的数据。你可能想要
row_number()
如果你正好需要20排。