sql如果缺少日期,如何填写时间序列的最后一个值

r55awzrz  于 2021-07-26  发布在  Java
关注(0)|答案(1)|浏览(275)

我得到了下表。

WITH
-- your input ....
input(t,grp,value) AS (
          SELECT TIMESTAMP '2020-05-28 00:00:00','A',55
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','B',1.09
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','C',1.8
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','A',68
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','B',1.9
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','C',1.19
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','A',10
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','C',0.88
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','A',22
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','C',13
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','A',66
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','B',88
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','C',99
)

如您所见,本表中缺少2020-30-05和2020-31-05之间的日期。因此,有必要在这些日期填写2020-29-05信息分组。此外,今天的日期比数据中的日期(06-03 vs 06-08)要大(因此,在本月,这些观察值丢失了)。因此,最终输出应如下所示:

date2                     Group    number
2020-28-05 00:00:00         A        55
2020-28-05 00:00:00         B        1.09
2020-28-05 00:00:00         C        1.8
2020-29-05 00:00:00         A        68
2020-29-05 00:00:00         B        1.9
2020-29-05 00:00:00         C        1.19
2020-30-05 00:00:00         A        68
2020-30-05 00:00:00         B        1.9
2020-30-05 00:00:00         C        1.19
2020-31-05 00:00:00         A        68
2020-31-05 00:00:00         B        1.9
2020-31-05 00:00:00         C        1.19
2020-01-06 00:00:00         A        10
2020-01-06 00:00:00         B        15
2020-01-06 00:00:00         C        0.88
2020-02-06 00:00:00         A        22
2020-02-06 00:00:00         B        15
2020-02-06 00:00:00         C        13
2020-03-06 00:00:00         A        66
2020-03-06 00:00:00         B        88
2020-03-06 00:00:00         C        99
And for periods 03-06 till 08-06 the same values

2020-08-06 00:00:00         A        66
2020-08-06 00:00:00         B        88
2020-08-06 00:00:00         C        99

下面的代码有助于查找日期中缺少的值,但是这些间隙不会在今天的日期中被填补。如何修复?

SELECT ts AS t, grp, TS_FIRST_VALUE(value,'const') AS value
FROM input
TIMESERIES ts AS '1 DAY' OVER(PARTITION BY grp ORDER BY t)
ORDER BY 1,2
kqhtkvqz

kqhtkvqz1#

它叫 INTERPOLATE 而不是外推,这就是挑战。
您需要将每组的最后一行添加到输入表中,但要使用今天的日期而不是实际/原始日期。注意 padding 以及 padded 我在下面使用的常用表表达式。vertica有我在这里使用的分析极限子句: LIMIT 1 OVER(PARTITION BY grp ORDER BY tmstmp DESC) ..

WITH
input(tmstmp,grp,nbr) AS (
          SELECT TIMESTAMP '2020-05-28 00:00:00','A',55
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','B',1.09
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','C',1.8
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','A',68
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','B',1.9
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','C',1.19
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','A',10
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','C',0.88
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','A',22
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','C',13
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','A',66
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','B',88
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','C',99
)
,
padding AS (
  SELECT
    CURRENT_DATE::timestamp
  , grp
  , nbr
  FROM input
  LIMIT 1 OVER(PARTITION BY grp ORDER BY tmstmp DESC)
)
,
padded AS (
  SELECT * FROM input
  UNION ALL
  SELECT * FROM padding
)
SELECT
  ts AS tmstmp
, grp
, TS_FIRST_VALUE(nbr,'const') AS nbr
FROM padded
TIMESERIES ts AS '1 DAY' OVER(PARTITION BY grp ORDER BY tmstmp)
ORDER BY 1,2
;
-- out        tmstmp        | grp |  nbr  
-- out ---------------------+-----+-------
-- out  2020-05-28 00:00:00 | A   | 55.00
-- out  2020-05-28 00:00:00 | B   |  1.09
-- out  2020-05-28 00:00:00 | C   |  1.80
-- out  2020-05-29 00:00:00 | A   | 68.00
-- out  2020-05-29 00:00:00 | B   |  1.90
-- out  2020-05-29 00:00:00 | C   |  1.19
-- out  2020-05-30 00:00:00 | A   | 68.00
-- out  2020-05-30 00:00:00 | B   |  1.90
-- out  2020-05-30 00:00:00 | C   |  1.19
-- out  2020-05-31 00:00:00 | A   | 68.00
-- out  2020-05-31 00:00:00 | B   |  1.90
-- out  2020-05-31 00:00:00 | C   |  1.19
-- out  2020-06-01 00:00:00 | A   | 10.00
-- out  2020-06-01 00:00:00 | B   | 15.00
-- out  2020-06-01 00:00:00 | C   |  0.88
-- out  2020-06-02 00:00:00 | A   | 22.00
-- out  2020-06-02 00:00:00 | B   | 15.00
-- out  2020-06-02 00:00:00 | C   | 13.00
-- out  2020-06-03 00:00:00 | A   | 66.00
-- out  2020-06-03 00:00:00 | B   | 88.00
-- out  2020-06-03 00:00:00 | C   | 99.00
-- out  2020-06-04 00:00:00 | A   | 66.00
-- out  2020-06-04 00:00:00 | B   | 88.00
-- out  2020-06-04 00:00:00 | C   | 99.00
-- out  2020-06-05 00:00:00 | A   | 66.00
-- out  2020-06-05 00:00:00 | B   | 88.00
-- out  2020-06-05 00:00:00 | C   | 99.00
-- out  2020-06-06 00:00:00 | A   | 66.00
-- out  2020-06-06 00:00:00 | B   | 88.00
-- out  2020-06-06 00:00:00 | C   | 99.00
-- out  2020-06-07 00:00:00 | A   | 66.00
-- out  2020-06-07 00:00:00 | B   | 88.00
-- out  2020-06-07 00:00:00 | C   | 99.00
-- out  2020-06-08 00:00:00 | A   | 66.00
-- out  2020-06-08 00:00:00 | B   | 88.00
-- out  2020-06-08 00:00:00 | C   | 99.00
-- out  2020-06-09 00:00:00 | A   | 66.00
-- out  2020-06-09 00:00:00 | B   | 88.00
-- out  2020-06-09 00:00:00 | C   | 99.00

相关问题