spark、hive

xbp102n0  于 2021-06-25  发布在  Hive
关注(0)|答案(2)|浏览(471)

我需要使用派生列中的前一个值来获得同一列的下一个结果。
默认情况下,每个名称的第一个匹配项为1
派生列=num(i)+派生列(i-1)

Name, Num, derived
A       0       1
A       1       2
A       0       2      
B       0       1
B       0       1
B       1       2
C       0       1
C       0       1
D       1       1
D       1       2
D       1       3

下面是我的半段代码——使用了窗口滞后函数但无法实现

spark.sql("""
select

z.name as name,

z.num as num,

case

  when z.row_num='1' then '1'

  else (lag(num) over(partition by z.name order by name))

  end as derived

from

(select name,num,row_number() over(partition by name order by name) as row_num

from result)z""").show()
bqujaahr

bqujaahr1#

为了解决这个问题,您需要一个列,可以用来对记录进行排序。我假设这样一个列和 id .
在 hive 里,你可以用 row_number() 做个Windows sum() 具体如下:

select
    name,
    num,
    sum(case when rn = 1 then 1 else num) over(partition by name order by id) derived
from (
    select
        t.*,
        row_number() over(partition by name order by id) rn
    from mytable t
) t
q9rjltbz

q9rjltbz2#

可以使用会话变量

select
  z.name as name,
  z.num as num,       
  @derived := case
  when z.row_num='1' then '1'
  else num + @derived
  end as derived
from
  (select name,num,row_number() over(partition by name order by name) as row_num
  from result)z,(SELECT @derived:= 0) d1

这会给你

name | num | derived
:--- | --: | :------
A    |   0 | 1      
A    |   1 | 2      
A    |   0 | 2      
B    |   0 | 1      
B    |   0 | 1      
B    |   1 | 2      
C    |   0 | 1      
C    |   0 | 1      
D    |   1 | 1      
D    |   1 | 2      
D    |   1 | 3

db<>在这里摆弄

相关问题