hive/sql绑定了一些列,其余的列是基于拉的其他列中最低/最高的

4szc88ey  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(474)

我有一个Hive表如下5列

  1. name orderno productcategory amount description
  2. KJFSFKS 1 1 40 D1
  3. KJFSFKS 2 2 50 D2
  4. KJFSFKS 3 2 67 D3
  5. KJFSFKS 4 2 10 D4
  6. KJFSFKS 5 3 2 D5
  7. KJFSFKS 6 3 5 D6
  8. KJFSFKS 7 3 6 D7
  9. KJFSFKS 8 4 8 D8
  10. KJFSFKS 9 5 8 D9
  11. KJFSFKS 10 5 10 D10

基于相同产品类别代码的所需输出,如果productcategory代码跨多行添加金额字段相同,则根据最高订单号选择描述,订单号始终选择最低,输出如下

  1. name orderno productcategory amount description
  2. KJFSFKS 1 1 40 D1
  3. KJFSFKS 2 2 127 D4
  4. KJFSFKS 5 3 13 D7
  5. KJFSFKS 8 4 8 D8
  6. KJFSFKS 9 5 18 D10

如上所述,有些字段的顺序不同,有些字段的顺序不同
我用了groupby,但是sum(amount)很好,那么description字段呢,它是基于orderno列的,在我的需求中还有其他列,我应该根据订单号来选择

w41d8nur

w41d8nur1#

  1. select name, orderno, productcategory, amount, description
  2. from
  3. (
  4. select name, orderno, productcategory,
  5. sum(amount) over(partition by name, productcategory) amount,
  6. first_value(description) over(partition by name, productcategory order by orderno desc) description,
  7. row_number() over (partition by name, productcategory order by orderno) rn
  8. from your_table
  9. )s where rn=1; --pick lowest orderno
  1. OK
  2. KJFSFKS 1 1 40 D1
  3. KJFSFKS 2 2 127 D4
  4. KJFSFKS 5 3 13 D7
  5. KJFSFKS 8 4 8 D8
  6. KJFSFKS 9 5 18 D10
  7. Time taken: 12.492 seconds, Fetched: 5 row(s)
7xzttuei

7xzttuei2#

  1. select name
  2. ,min(orderno) as orderno
  3. ,productcategory
  4. ,sum(amount) as amount
  5. ,max(named_struct('orderno',orderno,'description',description)).description
  6. from mytable
  7. group by name
  8. ,productcategory
  9. ;

相关问题