如何从PostgreSQL中的事件源JSONB列中获取最后一个非空值和按时间戳聚合的非空值?

hujrc8aj  于 2023-08-04  发布在  PostgreSQL
关注(0)|答案(2)|浏览(79)

我正在处理事件源数据,其中所有重要字段都组合到JSONB列中,并且大多数数据库行中缺少许多键。
我想得到:

  1. JSONB字段中包含的数组的聚合组合值(参见示例中的成分)
    1.根据时间戳的最新非空值
    我自己试过这个,我能够产生一个例子,它生成的正是我想在这里实现的,但它看起来相当丑陋,我想知道如何使以下查询更好。

Schema(PostgreSQL v15)

CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    identifier VARCHAR(255),
    timestamp TIMESTAMP WITH TIME ZONE,
    event_data JSONB
);
INSERT INTO events (identifier, timestamp, event_data)
VALUES
    ('12345', '2019-01-01T00:00:00.000Z', '{"target": "99999999"}'),
    ('12345', '2019-01-01T12:00:00.000Z', '{"ingredients": ["Banana", "Strawberry"]}'),
    ('12345', '2019-01-03T00:00:00.000Z', '{"target": "12345678", "user": "peterpan"}'),
    ('12345', '2019-01-04T00:00:00.000Z', '{"ingredients": ["Melon"], "user": "robinhood"}'),
    ('67890', '2019-01-03T00:00:00.000Z', '{"target": "0000", "user": "mickeymouse"}'),
    ('67890', '2019-01-04T00:00:00.000Z', '{"ingredients": ["Potato"]}');

字符串

查询#1

WITH events_flattened AS (
    SELECT
        identifier,
        timestamp,
        event_data->>'target' AS target,
        event_data->>'user' AS user,
        elem.part
    FROM events
    LEFT JOIN LATERAL jsonb_array_elements(event_data->'ingredients') elem(part) ON true
    ORDER BY timestamp DESC
)

SELECT
    identifier,
    (ARRAY_REMOVE(ARRAY_AGG(e.target),NULL))[1] as target,
    (ARRAY_REMOVE(ARRAY_AGG(e.user),NULL))[1] as user,
    ARRAY_REMOVE(ARRAY_AGG(part),NULL) as ingredients,
    MAX(timestamp) as latest_update
FROM events_flattened e
GROUP BY identifier;


为了使答案有帮助,它应该产生与此表所示完全相同的结果:
| 目标|使用者|配料|最新更新| latest_update |
| --|--|--|--| ------------ |
| 12345678|罗宾胡德|甜瓜,草莓,香蕉|2019-01-04T00:00:00.000Z| 2019-01-04T00:00:00.000Z |
| 万|米老鼠|马铃薯|2019-01-04T00:00:00.000Z| 2019-01-04T00:00:00.000Z |
View on DB Fiddle
我试图确定哪种查询和索引将有利于从这个表中准确地获得这种数据?

8yoxcaq7

8yoxcaq71#

CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    identifier VARCHAR(255),
    timestamp TIMESTAMP WITH TIME ZONE,
    event_data JSONB
);
INSERT INTO events (identifier, timestamp, event_data)
VALUES
    ('12345', '2019-01-01T00:00:00.000Z', '{"target": "99999999"}'),
    ('12345', '2019-01-01T12:00:00.000Z', '{"ingredients": ["Banana", "Strawberry"]}'),
    ('12345', '2019-01-03T00:00:00.000Z', '{"target": "12345678", "us": "peterpan"}'),
    ('12345', '2019-01-04T00:00:00.000Z', '{"ingredients": ["Melon"], "us": "robinhood"}'),
    ('67890', '2019-01-03T00:00:00.000Z', '{"target": "0000", "us": "mickeymouse"}'),
    ('67890', '2019-01-04T00:00:00.000Z', '{"ingredients": ["Potato"]}');

个字符
| 目标|我们|配料|最新更新| latest_update |
| --|--|--|--| ------------ |
| 12345678|罗宾胡德|{"“香蕉”",““草莓”",““甜瓜”"}| 2019-01-04 00:00:00+00| 2019-01-04 00:00:00+00 |
| 万|米老鼠|{"“土豆”"}| 2019-01-04 00:00:00+00| 2019-01-04 00:00:00+00 |
fiddle

eufgjt7s

eufgjt7s2#

下面是我在这里使用的查询。使用子查询和jsonb_array_elements是我的熟人Tommi Vainikainen提出的:

SELECT
  e.identifier,
  (
    SELECT e2.event_data->>'target'
    FROM events e2
    WHERE e2.identifier=e.identifier
    AND e2.event_data->>'target' IS NOT NULL
    ORDER by timestamp DESC LIMIT 1
  ) AS target,
  (
    SELECT e2.event_data->>'user'
    FROM events e2
    WHERE e2.identifier=e.identifier
    AND e2.event_data->>'user' IS NOT NULL
    ORDER by timestamp DESC LIMIT 1
  ) AS user,
  ARRAY(
    SELECT jsonb_array_elements(e2.event_data->'ingredients')
    FROM events e2
    WHERE e2.identifier=e.identifier
    AND e2.event_data->>'ingredients' IS NOT NULL
  ) AS ingredients,
  MAX(timestamp) AS latest_update
FROM events e
GROUP BY identifier;

字符串

相关问题