用标志设置为1的最后一个值替换值

pb3s4cty 于 2021-08-13 发布在 Java

关注(0)|答案(2)|浏览(413)

我有一个表，其中每一行包含在某个事件中更改的所有字段，以及一个与每个字段相关联的标志，如果该字段被更新，则标记该字段。为了简单起见，我在这里只显示“status”字段，但它们也是其他几个字段。在给定字段未被事件修改的情况下，该字段设置为null，标志也设置为null。

+----+---------------------+--------+---------------------+
| id |        date         | status | flag_changed_status |
+----+---------------------+--------+---------------------+
|  1 | 2020-01-03 19:32:17 | TODO   |                   1 |
|  1 | 2020-01-08 15:46:07 | WIP    |                   1 |
|  1 | 2020-01-08 15:53:53 |        |                     | //this line was generated because another field changed
|  1 | 2020-01-08 15:56:53 |        |                     | //this line was generated because another field changed
|  1 | 2020-01-08 16:02:31 | Done   |                   1 |
+----+---------------------+--------+---------------------+

我的目标是将未更改字段的行的字段值替换为标志为1时的最后一个值，例如get:

+----+---------------------+--------+---------------------+
| id |        date         | status | flag_changed_status |
+----+---------------------+--------+---------------------+
|  1 | 2020-01-03 19:32:17 | TODO   |                   1 |
|  1 | 2020-01-08 15:46:07 | WIP    |                   1 |
|  1 | 2020-01-08 15:53:53 | WIP    |                     |
|  1 | 2020-01-08 15:56:53 | WIP    |                     |
|  1 | 2020-01-08 16:02:31 | Done   |                   1 |
+----+---------------------+--------+---------------------+

我知道我想用 last_value bigquery中的分析函数，我尝试了：

SELECT ID_DEMANDE, date, status, 
last_value(status) OVER (ORDER BY flag_changed_status, DATE ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as current_status, flag_changed_status 
FROM table ORDER BY  id, DATE

其思想是，通过在order by函数中使用标志，标志设置为null的行将首先放入，然后最后一个\u值（status）将是标志\u changed\u status设置为1的最后一个值
但这只能在无界前向行和无界后向行之间使用，因为 ORDER BY 子句将在window frame子句（rows between…）之前处理，因此对于flag\u changed\u status为null的行，在order by处理之后，当前行号为0，因此无界前一行和当前行之间的最后一个值始终为null。
有没有办法先运行无界前一行和无界后一行之间的行，然后运行order by，这样last\ U value（status）将返回标志设置为1的当前行之前的最后一个值？还是有更简单的方法，仍然使用分析函数来完成一个查询中的所有不同字段？
编辑：我真的想复制上次设置标志时设置的状态，即使此状态为空，这就是为什么我尝试在order by中使用该标志。也就是说，如果初始表是：

+----+---------------------+--------+---------------------+
| id |        date         | status | flag_changed_status |
+----+---------------------+--------+---------------------+
|  1 | 2020-01-03 19:32:17 | TODO   |                   1 |
|  1 | 2020-01-08 15:46:07 | null   |                   1 |
|  1 | 2020-01-08 15:53:53 | null   |                null |
|  1 | 2020-01-08 15:56:53 | null   |                null |
|  1 | 2020-01-08 15:57:53 | WIP    |                   1 |
|  1 | 2020-01-08 15:58:53 | null   |                null |
|  1 | 2020-01-08 16:02:31 | Done   |                   1 |
+----+---------------------+--------+---------------------+

我需要：

+----+---------------------+--------+---------------------+
| id |        date         | status | flag_changed_status |
+----+---------------------+--------+---------------------+
|  1 | 2020-01-03 19:32:17 | TODO   |                   1 |
|  1 | 2020-01-08 15:46:07 | null   |                   1 |
|  1 | 2020-01-08 15:53:53 | null   |                null | // we copy the last status where the flag was 1, and it is null
|  1 | 2020-01-08 15:56:53 | null   |                null |
|  1 | 2020-01-08 15:57:53 | WIP    |                   1 |
|  1 | 2020-01-08 15:58:53 | WIP    |                null | //only this line changes
|  1 | 2020-01-08 16:02:31 | Done   |                   1 |
+----+---------------------+--------+---------------------+

但这似乎太复杂了，所以我将用一个自定义状态替换标志设置为1的所有空值，然后像@gordon linoff所建议的那样，使用一个简单的last_值（status ignore nulls）将提供几乎所需的结果

sql google-bigquery

来源：https://stackoverflow.com/questions/61956703/replace-value-with-last-value-where-flag-was-set-to-1

2条答案

按热度按时间

t8e9dugd1#

下面是bigquery标准sql


# standardSQL

SELECT * EXCEPT(grp),
  LAST_VALUE(status IGNORE NULLS) OVER (PARTITION BY grp ORDER BY date) AS updated_status
FROM (
  SELECT *,
    COUNTIF(flag_changed_status = 1) OVER(ORDER BY `date`) grp
  FROM `project.dataset.table`
)

如果要应用到问题的样本数据-结果是

Row id  date                status  flag_changed_status updated_status   
1   1   2020-01-03 19:32:17 TODO    1                   TODO     
2   1   2020-01-08 15:46:07 null    1                   null     
3   1   2020-01-08 15:53:53 null    null                null     
4   1   2020-01-08 15:56:53 null    null                null     
5   1   2020-01-08 15:57:53 WIP     1                   WIP  
6   1   2020-01-08 15:58:53 null    null                WIP  
7   1   2020-01-08 16:02:31 Done    1                   Done

赞(0）回复(0）举报 2021-08-13

4jb9z9bj2#

我更喜欢 lag(ignore nulls) . 但bigquery不支持这一点。相反，使用 first_value() / last_value() :

with t as (
      select 1 as id, '2020-01-03 19:32:17' as date, 'TODO' as status, 1 as file_changed_status union all
      select 1 as id, '2020-01-08 15:46:07' as date, 'WIP ' as status, 1 as file_changed_status union all
      select 1 as id, '2020-01-08 15:53:53' as date, null as status, null  as file_changed_status union all
      select 1 as id, '2020-01-08 15:56:53' as date, null as status,  null as file_changed_status union all
      select 1 as id, '2020-01-08 16:02:31' as date, 'Done' as status, 1 as file_changed_status
     )
select t.*,
       last_value(status ignore nulls) over (order by date) as imputed_status
from t;

赞(0）回复(0）举报 2021-08-13

我来回答

用标志设置为1的最后一个值替换值

2条答案

相关问题

热门标签

最新问答