postgresql UNNEST红移中的超级数据

hi3rlvi2  于 2023-06-22  发布在  PostgreSQL
关注(0)|答案(1)|浏览(183)

我在Redshift中有一个包含SUPER类型列的表。我的表中的列是:luid =>一个唯一的用户ID,event_nameevent_dataevent_time例如2023 - 06 - 12 00:00:00。
我在Product上有3种事件,event_name值是:viewedcartedordered
如果event_name被查看或 * carted**,则event_data列中的数据如下:

{
  "price": 17.85,
  "product_id": "135663",
  "name": "Cute Heart Design Non-Slip Soft Sole Women Slippers",
  "options": {
    "Size ": "36-37EU",
    "Color ": "Black"
  }
}

如果event_name为ordered,则event_data列中数据如下所示:

{
  "shipping_charges": 0,
  "order_products": [
    {
      "price": 17.85,
      "product_id": "135663",
      "name": "Cute Heart Design Non-Slip Soft Sole Women Slippers",
      "options": {
        "Size ": "36-37EU",
        "Color ": "Black"
      }
    },
    {
      "price": 14.49,
      "product_id": "125530",
      "name": "Faux Diamond Luxurious Valentine Women Ring - Silver",
      "options": {
        "Color ": "Silver"
      }
    }
  ],
  "order_id": "6517223"
}

注意:event_data的DataType为Objectorder_products的DataType为Array
我想知道一个产品是viewedcarted & ordered的次数。

SELECT event_data.product_id
   COUNT(CASE WHEN event_name = 'viewed' THEN 1 END) AS viewed,
   COUNT(CASE WHEN event_name = 'carted' THEN 1 END) AS carted,
FROM events
WHERE event_time >= '2023-06-12 00:00:00'
AND   event_time <= '2023-06-12 23:59:59'
AND event_name IN ('viewed', 'carted')
GROUP BY event_data.product_id
ORDER BY viewed DESC, carted DESC;

上面的查询工作正常,返回我VIEWED和CARTED的计数。
我想从每条记录中提取order_products,并按product_id对它们进行分组,并返回每个product_id的计数。我正在尝试[UNNEST][1]命令,但它不工作。我想我的问题中缺少了一些东西。
如果我运行以下查询,我会得到一个Dict列表,[{...}, {...}, {...}, ...]

select event_data.order_products from events
WHERE event_name = 'ordered'
AND   event_time >= '2023-06-12 00:00:00'
AND   event_time <= '2023-06-12 23:59:59';

但如果我应用UNNET,则不会返回任何结果。

SELECT
product_id,
COUNT(CASE WHEN event_name = 'ordered' THEN 1 END) AS ordered
FROM (
  SELECT
  event_data.product_id,
  FROM events,
   LATERAL (
     SELECT product_id
     FROM UNNEST(event_data.order_products) AS product
   ) AS event_data
WHERE ...

尝试了来自Internet资源的多个选项,但不起作用。
PS:我在MongoDB中也是这样做的,我使用UNWIND stage来扁平化order_products并获得所有3种类型的事件计数。

a7qyws3x

a7qyws3x1#

这看起来像是在RS查询中混合了其他数据库的语法。LATERAL和UNNEST不是RS运算符。你的SQL也有语法错误,所以我尽了最大努力来解码你想做的事情。让我知道如果我把这个放在一起不正确,并澄清。
你需要的是一个简单的数组的非嵌套。这在此处的RS文档中进行了描述-https://docs.aws.amazon.com/redshift/latest/dg/query-super.html
你只需要像这样交叉连接基表和数组:

SELECT op.product_id,
COUNT(CASE WHEN e.event_name = 'ordered' THEN 1 END) AS ordered
FROM events e, e.event_data.order_products op
WHERE event_name = 'ordered'
AND   event_time >= '2023-06-12 00:00:00'
AND   event_time <= '2023-06-12 23:59:59'
GROUP BY 1;

“FROM events e,e.event_data.order_products op”是关键部分。
这是我的测试缓冲区,所以你可以看看我是如何设置的,我做了什么。

create table events as select 'viewed' as event_name, '2023-06-12 04:00:00'::timestamp as event_time,
json_parse('{
  "price": 17.85,
  "product_id": "135663",
  "name": "Cute Heart Design Non-Slip Soft Sole Women Slippers",
  "options": {
    "Size ": "36-37EU",
    "Color ": "Black"
  }
}') as event_data
union all
select 'ordered' as event_name, '2023-06-12 05:00:00'::timestamp as event_time,
json_parse('{
  "shipping_charges": 0,
  "order_products": [
    {
      "price": 17.85,
      "product_id": "135663",
      "name": "Cute Heart Design Non-Slip Soft Sole Women Slippers",
      "options": {
        "Size ": "36-37EU",
        "Color ": "Black"
      }
    },
    {
      "price": 14.49,
      "product_id": "125530",
      "name": "Faux Diamond Luxurious Valentine Women Ring - Silver",
      "options": {
        "Color ": "Silver"
      }
    }
  ],
  "order_id": "6517223"
}') as event_data ;

SELECT event_data.product_id,
   COUNT(CASE WHEN event_name = 'viewed' THEN 1 END) AS viewed,
   COUNT(CASE WHEN event_name = 'carted' THEN 1 END) AS carted
FROM events
WHERE event_time >= '2023-06-12 00:00:00'
AND   event_time <= '2023-06-12 23:59:59'
AND event_name IN ('viewed', 'carted')
GROUP BY event_data.product_id
ORDER BY viewed DESC, carted DESC;

select event_data.order_products 
from events
WHERE event_name = 'ordered'
AND   event_time >= '2023-06-12 00:00:00'
AND   event_time <= '2023-06-12 23:59:59';

/*SELECT
product_id,
COUNT(CASE WHEN event_name = 'ordered' THEN 1 END) AS ordered
FROM (
  SELECT
  event_data.product_id
  FROM events,
   LATERAL (
     SELECT product_id
     FROM UNNEST(event_data.order_products) AS product
   ) AS event_data
WHERE event_name = 'ordered'
AND   event_time >= '2023-06-12 00:00:00'
AND   event_time <= '2023-06-12 23:59:59';*/

SELECT op.product_id,
COUNT(CASE WHEN e.event_name = 'ordered' THEN 1 END) AS ordered
FROM events e, e.event_data.order_products op
WHERE event_name = 'ordered'
AND   event_time >= '2023-06-12 00:00:00'
AND   event_time <= '2023-06-12 23:59:59'
GROUP BY 1;

相关问题