在日期之间生成值行

gab6jxml 于 2021-06-24 发布在 Hive

关注(0)|答案(1)|浏览(321)

我有一个数据表，列出了给定日期的id更改。结构如下（表a）：

+----------------------------------------------------------+
| person current_id previous_id action          date       |
+----------------------------------------------------------+
| A      1          0           'id assignment' 2019-01-01 |
| B      2          1           'id change'     2019-01-03 |
| A      2          1           'id change'     2019-01-02 |
| C      4          2           'id change'     2019-01-03 |
| ...    ...        ...         ...             ...        |
+----------------------------------------------------------+

但是，表a仅在日期发生变化时提供日期。
对于可追溯性研究，我尝试使用表a创建一个数据表（下面的表b）。每天应该包含该表中现有人员的相应id（使用配置单元）。
类似于（表b）：

+---------------------------+
| date          person  id  |
+---------------------------+
| 2019-01-01    A       1   |
| 2019-01-01    B       1   |
| 2019-01-01    C       2   |
| 2019-01-02    A       2   |
| 2019-01-02    B       1   |
| 2019-01-02    C       2   |
| 2019-01-03    A       2   |
| 2019-01-03    B       2   |
| 2019-01-03    C       4   |
| ...           ...     ... |
+---------------------------+

我所能做的就是为提到的人获取与时间无关的当前ID。我不知道从哪里开始生成输出表。无法建立逻辑。
提前感谢您的帮助！

sql Hive

来源：https://stackoverflow.com/questions/57357016/generating-value-rows-in-between-dates

1条答案

按热度按时间

iih3973s1#

首先，需要生成行。假设您每天至少有一次更改，您可以使用 cross join .
然后你需要计算每一天的价值。最简单的方法是 lag() 使用ignorenulls选项，但我不认为hive支持这一点。
相反，可以使用两个级别的窗口函数：

select person, date,
       coalesce(current_id,
                max(current_id) over (partition by person, id_date)
               ) as id
from (select p.person, d.date, a.current_id,
             max(case when a.current_id is not null then d.date end) over (partition by p.person order by d.date) as id_date
      from (select distinct person from tablea a) p cross join
           (select distinct date from tablea a) d left join
           tablea a
           on p.person = a.person and d.date = a.date
     ) pd;

如果你不能使用 cross join ，或许这会奏效：

from (select distinct person, 1 as joinkey from tablea a) p join
           (select distinct date, 1 as joinkey from tablea a) d
           on p.joinkey = d.joinkey left join
           tablea a
           on p.person = a.person and d.date = a.date

赞(0）回复(0）举报 2021-06-24

我来回答

在日期之间生成值行

1条答案

相关问题

热门标签

最新问答