如何对两列使用hivesql的lag函数?

m1m5dgzv  于 2021-06-26  发布在  Hive
关注(0)|答案(2)|浏览(596)

我觉得我有一个相当简单的sql问题要解决,只是不知道如何正确地搜索它。
假设我有一个基于时间更新值的表:

|timestamp|value|session|
|---------|-----|-------|
| ts1     | v1  |  s1   |
| ts2     | v2  |  s1   |
| ts3     | v3  |  s1   |
| ...     | ..  |  s2   |

我想得到当前值和前一个值以及相关的时间戳。
所以结果应该是:

|timestamp_current|value_current|timestamp_prev|value_prev|
|-----------------|-------------|--------------|----------|
|      ts2        |      v2     |    ts1       |    v1    |
|      ts3        |      v3     |    ts2       |    v2    |
|      ...        |      ..     |    ...       |    ..    |

如果我只想获取上一个值,而不是上一个时间戳,那么我认为下面的查询是正确的: select timestamp, value, lag(value,1) over (partition by (session) order by timestamp) from mytable 但是,从上一行中添加两个值的正确方法是什么?我是添加两个lag子句还是有更好的方法?

tquggr8v

tquggr8v1#

你可以用 lag() 两次;一次 prev_timestamp 就一次 prev_val 如下所示。

select * from
(
select timestamp, 
       value, 
       lag(timestamp) over(partition by session order by timestamp) as prev_timestamp, 
       lag(value) over(partition by session order by timestamp) as prev_value
from table1
) t
where prev_timestamp is not null
``` `where` 子句用于排除具有 `prev_timestamp` 作为 `NULL` 结果:

+-----------+-------+----------------+------------+
| timestamp | value | prev_timestamp | prev_value |
+-----------+-------+----------------+------------+
| ts2 | v2 | ts1 | v1 |
| ts3 | v3 | ts2 | v2 |
+-----------+-------+----------------+------------+

演示
py49o6xq

py49o6xq2#

实现这一点的另一种方法是使用row\u number函数并连接如下所示的记录。。但是在同一个查询中使用两个lag方法很可能比row\ u number和left join方法性能更好。

WITH dt AS (SELECT  timestamp, value, ROW_NUMBER() OVER(PARTITION BY session ORDER BY timestamp) as row_num FROM table1)
SELECT 
  t0.timestamp, 
  t0.value, 
  t1.timestamp as prev_timestamp,
  t1.value as prev_value
FROM dt t0 
LEFT OUTER JOIN dt t1
ON t0.row_num = t1.row_num - 1

相关问题