使用最新的数据而不是相等的数据连接sql

xdnvmnnf  于 2021-08-09  发布在  Java
关注(0)|答案(1)|浏览(221)

我需要加入2个Dataframe使用月份和名称。但是,在其中一个Dataframe中,我没有所有的月度结果,所以我想重复最近的一个。例如,

Dataframe A
name score  month
Alex   20   2020/01
Alex   30   2020/03

Dataframe B
name   month   tenure
Alex  2020/01     1
Alex  2020/02     2
Alex  2020/03     3

Join A+B using name and month - expected result
name   month  score  tenure
Alex  2020/01   20     1
Alex  2020/02   20     2 --> repeat the score from the most recent date
Alex  2020/03   30     3

有人知道我怎么做吗?

brccelvz

brccelvz1#

可以使用相关子查询:

select b.*,
       (select a.score
        from a
        where a.name = b.name and a.month <= b.month
        order by a.month desc
        limit 1
       ) as score
from b;

或者,可以使用窗口函数和 join :

select b.*, a.score
from b left join
     (select a.*,
             lead(month) over (partition by name order by month) as next_month
      from a
     ) a
     on b.name = a.name and
        b.month >= a.month and
        (b.month < a.next_month or a.next_month is null);

如果要从中获取多个列,则此方法很方便 a .

相关问题