hive,使用变量延迟排序行

ttcibm8c  于 2021-06-29  发布在  Hive
关注(0)|答案(1)|浏览(337)

我有下面的Hive表:

product |  price
   A    |   100
   B    |   102
   C    |   220
   D    |   240
   E    |   242
   F    |   410

对于每一行,我想用当前价格除以较低的价格,如果结果大于0.9,我想增加一个行号。如果结果小于0.9,则此行的行号应为1,当前价格变为较低价格,然后迭代。
结果应如下所示:

product |  price | row_number
   A    |   100  |     1
   B    |   102  |     2
   C    |   220  |     1
   D    |   240  |     2
   E    |   242  |     3
   F    |   410  |     1

因为:

lower price = 100: product A get 1 as row_number
100/102 >= 0.9: product B get 2 as row_number
100/220 < 0.9: product C get 1 as row_number, lower price = 220
220/240 >= 0.9: product D get 2 as row_number
220/242 >= 0.9: product E get 3 as row_number
220/410 < 0.9: product F get 1 as row_number, lower price = 410

我在考虑创建一个临时的行号,按价格排序:

product |  price | temp_row_number
   A    |   100  |     1
   B    |   102  |     2
   C    |   220  |     3
   D    |   240  |     4
   E    |   242  |     5
   F    |   410  |     6

然后:

Select
   product,
   price,
   case
     when lag(price,temp_row_number-1,0)/price over() >= 0.9 then lag(price,temp_row_number-1,0)
     else price
   end as test
from my_table

这将检索:

product |  price | test
   A    |   100  | 100
   B    |   102  | 100
   C    |   220  | 220
   D    |   240  | 240
   E    |   242  | 242
   F    |   410  | 410

但理想情况下我想找回

product |  price | test
   A    |   100  | 100
   B    |   102  | 100
   C    |   220  | 220
   D    |   240  | 220
   E    |   242  | 220
   F    |   410  | 410

所以我可以使用row\u number()函数order by product and price来计算row\u number row,并得到预期的结果。

kninwzqo

kninwzqo1#

派生表
(选择产品,价格,(价格介于100和200之间时为案例,价格介于200和300之间时为1,价格介于300和400之间时为2,然后为3结束)作为rn

FROM #test)

从cte order by product中选择product,price,row\ U number()over(partition by rn order by rn)

相关问题