我正在使用hiveql,我需要选择每个类别中最常购买的前10个项目。我想同样的问题可以很容易地用常规sql解决。。
有什么方法比下面的片段快吗?我只是不明白我们怎么能在这里使用所谓的窗口函数。。。
SELECT item,
COUNT(item) AS freq FROM mytable WHERE category='category1' GROUP BY item ORDER BY freq DESC LIMIT 1
union all SELECT item, COUNT(item) AS freq FROM mytable WHERE category='category2' GROUP BY product ORDER BY freq DESC LIMIT 1
union all SELECT item, COUNT(item) AS freq FROM mytable WHERE category='category3' GROUP BY item ORDER BY freq DESC LIMIT 1
union all SELECT item, COUNT(item) AS freq FROM mytable WHERE category='category4' GROUP BY item ORDER BY freq DESC LIMIT 1
...
表数据结构:
item1 category1
item2 category1
item2 category1
item5 category2
item5 category2
item4 category3
item2 category4
结果应为:
item2 category1
item5 category2
item4 category3
item2 category4
1条答案
按热度按时间rhfm7lfc1#
使用
row_number()
以及group by
:这将为每个类别返回一行,即使最常见的类别有关联也是如此。如果你想要所有的可能性
ties
,使用rank()
而不是row_number()
.