postgresql 添加一列并基于其他列值进行填充

nuypyhwy  于 2023-05-28  发布在  PostgreSQL
关注(0)|答案(3)|浏览(269)

我有下表:

CREATE TABLE trajectory(
    user_id int, 
    session_id int, 
    timestamp timestamp with time zone,
    lat double precision, 
    lon double precision
);

INSERT INTO trajectory(user_id, session_id, timestamp, lat, lon) VALUES 
(1, 25304,'2008-10-23 02:53:04+01', 39.984702, 116.318417),
(1, 25304, '2008-10-23 02:53:10+01', 39.984683, 116.31845), 
(1, 25304, '2008-10-23 02:53:15+01', 39.984686, 116.318417),
(1, 25304, '2008-10-23 02:53:20+01', 39.984688, 116.318385), 
(1, 20959,'2008-10-24 02:09:59+01', 40.008304, 116.319876),
(1, 20959,'2008-10-24 02:10:04+01', 40.008413, 116.319962), 
(1, 20959,'2008-10-24 02:10:14+01', 40.007171, 116.319458),
(2, 55305, '2008-10-23 05:53:05+01', 39.984094, 116.319236), 
(2, 55305, '2008-10-23 05:53:11+01', 39.984198, 116.319322), 
(2, 55305, '2008-10-23 05:53:21+01', 39.984224, 116.319402), 
(2, 34104, '2008-10-23 23:41:04+01', 40.013867, 116.306473),
(2, 34104, '2008-10-23 23:41:16+01', 40.013907, 116.306488);

问题:

我想在user_idsession_id列的基础上向该表添加一列trip_id,这样当用户的会话id更改时,我知道用户正在进行新的旅行,因此我将该id添加到新的trip列。
所需输出:

user_id |session_id  |timestamp              |    lat       |   lon     | trip_id
--------|------------|-----------------------|--------------|-----------|-----------
  1     |     25304  |2008-10-23 02:53:04+01 | 39.984702    |116.318417 |       1
  1     |     25304  |2008-10-23 02:53:10+01 | 39.984683    |116.31845  |       1
  1     |     25304  |2008-10-23 02:53:15+01 | 39.984686    |116.318417 |       1
  1     |     25304  |2008-10-23 02:53:20+01 | 39.984688    |116.318385 |       1
  1     |     20959  |2008-10-24 02:09:59+01 |40.008304     |116.319876 |       2
  1     |     20959  |2008-10-24 02:10:04+01 |40.008413     |116.319962 |       2
  1     |     20959  |2008-10-24 02:10:14+01 |40.007171     |116.319458 |       2
  2     |     55305  |2008-10-23 05:53:05+01 |39.984094     |116.319236 |       1
  2     |     55305  |2008-10-23 05:53:11+01 |39.984198     |116.319322 |       1
  2     |     55305  |2008-10-23 05:53:21+01 |39.984224     |116.319402 |       1
  2     |     34104  |2008-10-23 23:41:04+01 |40.013867     |116.306473 |       2
  2     |     34104  |2008-10-23 23:41:16+01 |40.013907     |116.306488 |       2

How can I do this?

编辑

感谢这些伟大的答案,但所有收到的答案都是检索表值,它们不会修改表。另外,我添加了最初忽略的timestamp列,以为这样可以简化我的问题。

bkhjykvo

bkhjykvo1#

我们可以用缺口和孤岛技术来解决这个问题。假设你有一个列来排序你的数据集,比如ordering_id

select t.*, 
    count(*) 
        filter(where session_id is distinct from lag_session_id) 
        over(partition by user_id order by ordering_id) trip_id
from (
    select t.*, 
        lag(session_id) over(partition by user_id order by ordering_id) lag_session_id
    from trajectory t
) t

其思想是将同一用户的“前一个”会话ID与lag一起带入,然后将其与当前行上的值进行比较;然后,我们可以计算它改变了多少次,这定义了行程ID。
| 用户id|会话ID|后来|隆|订购ID|滞后会话ID|跳闸标识|
| - -----|- -----|- -----|- -----|- -----|- -----|- -----|
| 一个|25304| 39.984702| 116.318417|一个|联系我们|一个|
| 一个|25304| 39.984683| 116.31845| 2| 25304|一个|
| 一个|25304| 39.984686| 116.318417| 3| 25304|一个|
| 一个|25304| 39.984688| 116.318385| 4| 25304|一个|
| 一个|20959| 40.008304| 116.319876| 5个|25304| 2|
| 一个|20959| 40.008413| 116.319962|六|20959| 2|
| 一个|20959| 40.007171| 116.319458|七个|20959| 2|
| 2|五五三零五|39.984094| 116.319236|八|联系我们|一个|
| 2|五五三零五|39.984198| 116.319322|九个|五五三零五|一个|
| 2|五五三零五|39.984224| 116.319402|十个|五五三零五|一个|
| 2| 34104| 40.013867| 116.306473|十一|五五三零五|2|
| 2| 34104| 40.013907| 116.306488|十二岁|34104| 2|
fiddle

kdfy810k

kdfy810k2#

还有一种办法是:

  • 选择标识符的最小记录
  • 通过在其上排序来应用DENSE RANK

如果您没有任何排序字段,您可以使用“ctid”来实现此目的:它是一个存储表中每行的物理位置的字段。否则使用timestamp字段代替“ctid"。

ALTER TABLE trajectory ADD COLUMN trip_id INT;

WITH cte AS (
    SELECT ctid, 
           user_id, 
           MIN(ctid) OVER(PARTITION BY user_id, session_id) AS parts 
    FROM trajectory
), cte2 AS (
    SELECT ctid,
           DENSE_RANK() OVER(PARTITION BY user_id ORDER BY parts) AS trip_id 
    FROM cte
)
UPDATE trajectory 
SET trip_id = cte2.trip_id
FROM cte2
WHERE trajectory.ctid = cte2.ctid;

输出

| 用户id|会话ID|后来|隆|跳闸标识|
| - -----|- -----|- -----|- -----|- -----|
| 一个|25304| 39.984702| 116.318417|一个|
| 一个|25304| 39.984683| 116.31845|一个|
| 一个|25304| 39.984686| 116.318417|一个|
| 一个|25304| 39.984688| 116.318385|一个|
| 一个|20959| 40.008304| 116.319876| 2|
| 一个|20959| 40.008413| 116.319962| 2|
| 一个|20959| 40.007171| 116.319458| 2|
| 2|五五三零五|39.984094| 116.319236|一个|
| 2|五五三零五|39.984198| 116.319322|一个|
| 2|五五三零五|39.984224| 116.319402|一个|
| 2| 34104| 40.013867| 116.306473| 2|
| 2| 34104| 40.013907| 116.306488| 2|
查看演示here

vhipe2zx

vhipe2zx3#

这可以通过使用窗口函数lag()检索前一行并确定它是否已更改,然后使用窗口函数sum()检索trip_id来实现:

with cte as (
  select *, case when 
                 session_id - lag(session_id, 1, session_id) 
                              over (partition by user_id order by timestamp) = 0 
                 then 0 else 1 end as diff
  from trajectory
)
select user_id, session_id, timestamp,  lat, lon, 1 + sum(diff) over (partition by user_id order by timestamp) as trip_id
from cte
order by user_id, timestamp

假设user_id/timestamp是唯一的,则更新可以是:

ALTER TABLE trajectory ADD COLUMN trip_id int;

with cte as (
  select *, case when session_id - lag(session_id, 1, session_id) over (partition by user_id order by timestamp) = 0 then 0 else 1 end as diff
  from trajectory
),
cte2 as (
  select user_id,   session_id, timestamp,  lat, lon, 1 + sum(diff) over (partition by user_id order by timestamp) as trip_id
  from cte
  order by user_id, timestamp
)
UPDATE trajectory 
SET trip_id = cte2.trip_id
FROM cte2
WHERE trajectory.timestamp = cte2.timestamp and trajectory.user_id = cte2.user_id

结果:
| 用户id|会话ID|时间戳|后来|隆|跳闸标识|
| - -----|- -----|- -----|- -----|- -----|- -----|
| 一个|25304| 2008-10-23 02:53:04+01| 39.984702| 116.318417|一个|
| 一个|25304| 2008-10-23 02:53:10+01| 39.984683| 116.31845|一个|
| 一个|25304| 2008-10-23 02:53:15+01| 39.984686| 116.318417|一个|
| 一个|25304| 2008-10-23 02:53:20+01| 39.984688| 116.318385|一个|
| 一个|20959| 2008-10-24 02:09:59+01| 40.008304| 116.319876| 2|
| 一个|20959| 2008-10-24 02:10:04+01| 40.008413| 116.319962| 2|
| 一个|20959| 2008-10-24 02:10:14+01| 40.007171| 116.319458| 2|
| 2|五五三零五|2008-10-23 05:53:05+01| 39.984094| 116.319236|一个|
| 2|五五三零五|2008-10-23 05:53:11+01| 39.984198| 116.319322|一个|
| 2|五五三零五|2008-10-23 05:53:21+01| 39.984224| 116.319402|一个|
| 2| 34104| 2008-10-23 23:41:04+01| 40.013867| 116.306473| 2|
| 2| 34104| 2008-10-23 23:41:16+01| 40.013907| 116.306488| 2|
Demo here

相关问题