从每个时间窗口中选取第一行

qfe3c7zg  于 2021-08-09  发布在  Java
关注(0)|答案(2)|浏览(343)

我有两列:一列是时间列,另一列是某种布尔类型的列:

GMT  VAL
2010-08-01 10:59:32   1
2010-08-01 10:59:33   0
2010-08-01 10:59:34   1
2010-08-01 10:59:36   0
2010-08-01 10:59:38   1
2010-08-01 10:59:41   1
2010-08-01 10:59:43   0
2010-08-01 10:59:45   1
2010-08-01 10:59:47   0
2010-08-01 10:59:53   1

我想从每10秒一个窗口取第一排。

GMT  VAL
2010-08-01 10:59:32   1
2010-08-01 10:59:43   0

我该怎么做?

nwwlzxa7

nwwlzxa71#

你可以用 row_number() :

select t.*
from (select t.*,
             row_nubmer() over (partition by date_trunc('minute', gmt), floor(extract(seconds from gmt) / 6)
                                order by gmt) as seqnum
      from t
     ) t
where seqnum = 1;

也可以转换为字符串:

select t.*
from (select t.*,
             row_nubmer() over (partition by left(to_char(gmt, 'YYYYMMDDHH24MMSS'), 13)
                                order by gmt) as seqnum
      from t
     ) t
where seqnum = 1;

或使用 epoch :

select t.*
from (select t.*,
             row_nubmer() over (partition by floor(extract(epoch from gmt) / 10)                                   order by gmt) as seqnum
      from t
     ) t
where seqnum = 1;
qvsjd97n

qvsjd97n2#

vertica处于最佳状态—尽管您确实会得到一个时间序列,它会被捕捉到下一个10秒的边界以从中开始,因此如果您不修复它,它会返回不同的行。
如果您确实需要精确的原始时间戳,那么将最小时间序列时间戳与下一个最小实际时间戳之间的差异添加到10秒的快照时间片中—在这种特定情况下为2秒—检查 tb 以及 ts 下面是常用的表表达式。

WITH
-- your input ...
input(gmt,val) AS (
          SELECT TIMESTAMP '2010-08-01 10:59:32',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:33',0
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:34',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:36',0
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:38',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:41',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:43',0
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:45',1
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:47',0
UNION ALL SELECT TIMESTAMP '2010-08-01 10:59:53',1
)
,
-- create the timeseries - I decide to snap it to exact 10-second time slices
-- use the Vertica TIME SLICE function to create the limits of the time series
tm(tm) AS (
            SELECT MIN(TIME_SLICE(gmt,10,'SECOND','START')) AS tm FROM input
  UNION ALL SELECT MAX(TIME_SLICE(gmt,10,'SECOND','START')) AS tm FROM input
)
,
-- use Vertica's TIMESERIES clause to actually create the time series
-- which will be snapped to 10-second borders
tb(tb) AS (
  SELECT tb 
  FROM tm
  TIMESERIES tb AS '10 SECONDS' OVER(ORDER BY tm)
)
,
 -- add the difference between timeseries timestamp and actual timestamp
ts(ts) AS (
  SELECT 
    tb +( (SELECT MIN(gmt) FROM INPUT) - (SELECT MIN(tb) FROM tb) )
  FROM tb
)
-- finally, use the "Event Series Join"
-- - That's the INTERPOLATE PREVIOUS VALUE predicate - 
-- to apply an outer join
SELECT
  gmt
, ts AS control_ts
, val
FROM input
LEFT
JOIN ts
  ON gmt INTERPOLATE PREVIOUS VALUE ts
WHERE gmt IS NOT NULL
-- Vertica's Analytic Limit Clause
LIMIT 1 OVER(PARTITION BY ts ORDER BY gmt)
;

退货:

gmt         |     control_ts      | val 
---------------------+---------------------+-----
 2010-08-01 10:59:32 | 2010-08-01 10:59:32 |   1
 2010-08-01 10:59:43 | 2010-08-01 10:59:42 |   0
 2010-08-01 10:59:53 | 2010-08-01 10:59:52 |   1

相关问题