hive:查找前20%的记录

v8wbuo2f  于 2021-05-27  发布在  Hadoop
关注(0)|答案(3)|浏览(999)

我有一些数据like:-

  1. ID PRICE
  2. 1 100
  3. 2 200
  4. 3 120
  5. 4 130
  6. 5 320
  7. 6 300
  8. 7 200
  9. 8 100
  10. 9 120
  11. 10 250

我需要找到最高20%的价格。
预期output:-

  1. ID PRICE
  2. 5 320
  3. 6 300
tf7tbtn2

tf7tbtn21#

这里有一个方法,你可以做到这一点,而不必使用 join .

  1. Select id,price from (select id,price, row_number() over(order by price desc) r1,count(*) over()*(20/100) ct from table_name)final where r1<=ct ;
t3irkdon

t3irkdon2#

下面是问题-

  1. with top_20 as (
  2. select
  3. max(price)*0.8 as price1
  4. from
  5. <tableName>
  6. )
  7. select * from <tableName> t1 , top_20 t2 where t1.price > t2.price1;
  8. select
  9. name,
  10. price
  11. from
  12. (select
  13. name,
  14. price,
  15. max(price)*0.8 over (order by price) as top_20
  16. from <tableName>
  17. ) t1
  18. where
  19. t1.price > t1.top_20;

下面的查询在配置单元中不起作用-

  1. select * from <tableName> where price > (select max(salary)*0.8 from <tableName>)
  2. select * from <tableName> t1 where exists (select salary from <tablename> t2 where t1.salary > t2.salary*0.8)

原因-配置单元不支持where子句中具有相等条件的子查询,它支持with only in、not in、exists和not exists。
即使有exists和not exists,它也只支持equijoin,请参阅https://cwiki.apache.org/confluence/display/hive/languagemanual+subqueries#languagemanualsubqueries-有关详细信息,请参阅where子句中的子查询
希望这有帮助。

展开查看全部
s8vozzvw

s8vozzvw3#

你不需要连接就可以做到。用解析函数计算 max(price) ,取80%,然后使用过滤器价格>80%:

  1. with your_data as ( --this is your data
  2. select stack(10,
  3. 1 , 100,
  4. 2 , 200,
  5. 3 , 120,
  6. 4 , 130,
  7. 5 , 320,
  8. 6 , 300,
  9. 7 , 200,
  10. 8 , 100,
  11. 9 , 120,
  12. 10, 250) as (ID, PRICE)
  13. )
  14. select id, price
  15. from
  16. (
  17. select d.*, max(price) over()*0.8 as pct_80 from your_data d
  18. )s where price>pct_80

结果:

  1. OK
  2. id price
  3. 6 300
  4. 5 320

用你的table代替 WITH 子查询,必要时按id添加订单。

展开查看全部

相关问题