hive：查找前20%的记录

v8wbuo2f 于 2021-05-27 发布在 Hadoop

关注(0)|答案(3)|浏览(999)

我有一些数据like:-

我需要找到最高20%的价格。
预期output:-

ID  PRICE
5   320
6   300

hadoop Hive hiveql bigdata

来源：https://stackoverflow.com/questions/54619056/hive-find-top-20-percent-records

3条答案

按热度按时间

tf7tbtn21#

这里有一个方法，你可以做到这一点，而不必使用 join .

Select id,price from (select id,price, row_number() over(order by price desc) r1,count(*) over()*(20/100) ct from table_name)final where r1<=ct ;

赞(0）回复(0）举报 2021-05-27

t3irkdon2#

下面是问题-

with top_20 as (
  select 
    max(price)*0.8 as price1 
  from 
    <tableName>
)
select * from <tableName> t1 , top_20 t2 where t1.price > t2.price1;
select 
 name, 
 price 
from 
 (select 
    name, 
    price, 
    max(price)*0.8 over (order by price) as top_20 
 from <tableName>
 ) t1 
where 
 t1.price > t1.top_20;

下面的查询在配置单元中不起作用-

select * from <tableName> where price > (select max(salary)*0.8 from <tableName>)
select * from <tableName> t1 where exists (select salary from <tablename> t2 where t1.salary > t2.salary*0.8)

原因-配置单元不支持where子句中具有相等条件的子查询，它支持with only in、not in、exists和not exists。
即使有exists和not exists，它也只支持equijoin，请参阅https://cwiki.apache.org/confluence/display/hive/languagemanual+subqueries#languagemanualsubqueries-有关详细信息，请参阅where子句中的子查询
希望这有帮助。

展开查看全部

赞(0）回复(0）举报 2021-05-27

s8vozzvw3#

你不需要连接就可以做到。用解析函数计算 max(price) ，取80%，然后使用过滤器价格>80%：

with your_data as ( --this is your data
select stack(10,
1 ,  100,
2 ,  200,
3 ,  120,
4 ,  130,
5 ,  320,
6 ,  300,
7 ,  200,
8 ,  100,
9 ,  120,
10,  250) as (ID,  PRICE)
)
select id, price 
from
(
select d.*, max(price) over()*0.8 as pct_80 from your_data d
)s where price>pct_80

结果：

OK
id      price
6       300
5       320

用你的table代替 WITH 子查询，必要时按id添加订单。

展开查看全部

赞(0）回复(0）举报 2021-05-27

我来回答

hive：查找前20%的记录

3条答案

相关问题

热门标签

最新问答