查找组、配置单元中的前n个示例

e5nszbig  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(380)

我有一个表,其中每条记录都有列:title和category。我想找到两个标题与大多数发生在他们的类别。有些标题列在这两个类别中。如何在Hive中实现这一点?
下面是一个表创建查询:

create table book(category String, title String) row format delimited fields terminated by '\t' stored as textfile;

示例数据:

fiction book1
fiction book2
fiction book3
fiction book4
fiction book5
fiction book6
fiction book7
fiction book8
fiction book8
fiction book8
psychology  book1
psychology  book2
psychology  book2
psychology  book2
psychology  book2
psychology  book7
psychology  book7
psychology  book7

预期结果:

fiction book8
fiction any other
psychology  book2
psychology  book7

目前我已成功编写了以下查询:

SELECT * FROM  
(SELECT category, title,
             count(*) as sale_count
             from book
             Group BY category, title) a 
order by category, sale_count DESC;

这使计数为一个标题在每个类别,但我找不到方法返回只有2个顶级记录从每个类别

xmd2e60i

xmd2e60i1#

对于只有两个最上面的记录,请使用行号()

select category, title, sale_count
from
(
SELECT a.*,
row_number() over(partition by category order by sale_count desc) rn
 FROM  
(SELECT category, title,
             count(*) as sale_count
             from book
             Group BY category, title) a 
)s where rn <=2

order by category, sale_count DESC;

如果有多行具有相同的top sales,并且需要返回两个top Count的所有top sales行,请使用 DENSE_RANK 而不是 row_number ,如果有具有相同销售数量的标题,它将分配相同的排名。

相关问题