选择每个类别中前n个最常购买的项目

bxfogqkk  于 2021-06-27  发布在  Hive
关注(0)|答案(1)|浏览(282)

我正在使用hiveql,我需要选择每个类别中最常购买的前10个项目。我想同样的问题可以很容易地用常规sql解决。。
有什么方法比下面的片段快吗?我只是不明白我们怎么能在这里使用所谓的窗口函数。。。

SELECT item, 
COUNT(item) AS freq FROM mytable WHERE category='category1' GROUP BY item ORDER BY freq DESC LIMIT 1
union all SELECT item, COUNT(item) AS freq FROM mytable WHERE category='category2' GROUP BY product ORDER BY freq DESC LIMIT 1
union all SELECT item, COUNT(item) AS freq FROM mytable WHERE category='category3' GROUP BY item ORDER BY freq DESC LIMIT 1
union all SELECT item, COUNT(item) AS freq FROM mytable WHERE category='category4' GROUP BY item ORDER BY freq DESC LIMIT 1
...

表数据结构:

item1 category1
item2 category1
item2 category1
item5 category2
item5 category2
item4 category3
item2 category4

结果应为:

item2 category1
item5 category2
item4 category3
item2 category4
rhfm7lfc

rhfm7lfc1#

使用 row_number() 以及 group by :

SELECT category, item, freq
FROM (SELECT category, item, COUNT(*) AS freq,
             ROW_NUMBER() OVER (PARTITION BY category ORDER BY COUNT(*) DESC) as seqnum
      FROM mytable 
      GROUP BY category, item
     ) ci
WHERE seqnum = 1;

这将为每个类别返回一行,即使最常见的类别有关联也是如此。如果你想要所有的可能性 ties ,使用 rank() 而不是 row_number() .

相关问题