oracle 连续几天购买的客户

bksxznpy  于 2023-05-06  发布在  Oracle
关注(0)|答案(4)|浏览(143)

下面是我尝试列出的连续购买10天或更长时间的客户。
我正在尝试获得输出(见下文),并可能需要一些帮助。我知道这可能可以用MATCH_RECOGNIZE来完成,但我对它不太熟悉,所以我更喜欢增强我当前的尝试或确定性,以接受任何其他建议,以实现我想要的输出。
下面是我的测试案例。提前感谢所有回答的人。

ALTER SESSION SET NLS_DATE_FORMAT = 'DD-MON-YYYY HH24:MI:SS';

CREATE TABLE customers 
(CUSTOMER_ID, FIRST_NAME, LAST_NAME) AS
SELECT 1, 'Faith', 'Mazzarone' FROM DUAL UNION ALL
SELECT 2, 'Lisa', 'Saladino' FROM DUAL UNION ALL
SELECT 3, 'Micheal', 'Palmice' FROM DUAL UNION ALL
SELECT 4, 'Joseph', 'Zaza' FROM DUAL UNION ALL
SELECT 5, 'Jerry', 'Torchiano' FROM DUAL;

ALTER TABLE customers 
ADD CONSTRAINT customers_pk PRIMARY KEY (customer_id);

CREATE TABLE items 
(PRODUCT_ID, PRODUCT_NAME, PRICE) AS
SELECT 100, 'Black Shoes', 79.99 FROM DUAL UNION ALL
SELECT 101, 'Brown Pants', 111.99 FROM DUAL UNION ALL
SELECT 102, 'White Shirt', 10.99 FROM DUAL;

ALTER TABLE items 
ADD CONSTRAINT items_pk PRIMARY KEY (product_id);

create table purchases(
  ORDER_ID NUMBER GENERATED BY DEFAULT AS IDENTITY (START WITH 1) NOT NULL,
  customer_id   number, 
  PRODUCT_ID NUMBER, 
  QUANTITY NUMBER, 
  purchase_date timestamp
);

ALTER TABLE purchases 
ADD CONSTRAINT order_pk PRIMARY KEY (order_id);

ALTER TABLE purchases ADD CONSTRAINT customers_fk FOREIGN KEY (customer_id) REFERENCES customers(customer_id);

ALTER TABLE purchases ADD CONSTRAINT items_fk FOREIGN KEY (PRODUCT_ID) REFERENCES items(product_id);

insert  into purchases (customer_id, product_id, quantity, purchase_date) 
SELECT 3, 102, 4,TIMESTAMP '2022-12-22 21:44:35' + NUMTODSINTERVAL ( LEVEL * 2, 'DAY') FROM    dual
CONNECT BY  LEVEL <= 15 UNION ALL 
select 1, 101,3, date '2023-03-29' + level * interval '2' day from dual
          connect by level <= 12
union all
select 2, 101,2, date '2023-01-15' + level * interval '8' hour from dual
          connect by level <= 15
union all
select 2, 102,2,date '2023-04-13' + level * interval '1 1' day to hour from dual
          connect by level <= 11
union all
select 3, 101,2, date '2023-02-01' + level * interval '1 05:03' day to minute from dual
          connect by level <= 10
union all
select 3, 101,1, date '2023-04-22' + level * interval '23' hour from dual
          connect by level <= 23
union all
select 3, 100,1,  date '2022-03-01' + level * interval '1 00:23:05' day to second from dual
          connect by level <= 15
union all
select 4, 102,1, date '2023-01-01' + level * interval '5' hour from dual
          connect by level <= 60;

WITH t as (
     select distinct CUSTOMER_ID,
    trunc(PURCHASE_DATE) dat
       from purchases
    )
    ,tt as (
        select t.*
        ,row_number() over (partition by CUSTOMER_ID order by dat) rn
        from t
    )
   ,ttt as (
   select CUSTOMER_ID, 
         min(dat) start_date,
         max(dat) end_date,
         count(*) day_count
     from tt
     group by 
      CUSTOMER_ID, dat-rn
      having count(*) >= 10
   )
   select 
           c.customer_id,
           c.first_name,
           c.last_name,  
           ttt.start_date,
           ttt.end_date,
           ttt.day_count
     from   customers c, ttt
    where  
    c.customer_id = ttt.customer_id;

/*desired output */

CUSTOMER_ID FIRST_NAME  LAST_NAME   FIRST_DATE  LAST_DATE   DAY_COUNT   PURCHASE_COUNT
2   Lisa    Saladino    14-APR-2023 00:00:00    24-APR-2023 00:00:00    11  11
3   Micheal Palmice 02-MAR-2022 00:00:00    16-MAR-2022 00:00:00    15  15
3   Micheal Palmice 22-APR-2023 00:00:00    14-MAY-2023 00:00:00    23  23
4   Joseph  Zaza    01-JAN-2023 00:00:00    13-JAN-2023 00:00:00    13  60
icomxhvb

icomxhvb1#

我更愿意加强我目前的尝试
您可以通过更改第一个CTE 't'查询来获得所需的结果:

select distinct CUSTOMER_ID,
    trunc(PURCHASE_DATE) dat
       from purchases

select CUSTOMER_ID,
    trunc(PURCHASE_DATE) dat,
    count(*) cnt
       from purchases
   group by CUSTOMER_ID, trunc(PURCHASE_DATE)

然后将其包括在‘ttt’查询中:

sum(cnt) purchase_count

和最终选择列表。总而言之(不需要固定缩进或切换到现代连接语法):

WITH t as (
     select CUSTOMER_ID,
    trunc(PURCHASE_DATE) dat,
    count(*) cnt
       from purchases
   group by CUSTOMER_ID, trunc(PURCHASE_DATE)
    )
    ,tt as (
        select t.*
        ,row_number() over (partition by CUSTOMER_ID order by dat) rn
        from t
    )
   ,ttt as (
   select CUSTOMER_ID, 
         min(dat) start_date,
         max(dat) end_date,
         count(*) day_count,
         sum(cnt) purchase_count
     from tt
     group by 
      CUSTOMER_ID, dat-rn
      having count(*) >= 10
   )
   select 
           c.customer_id,
           c.first_name,
           c.last_name,  
           ttt.start_date,
           ttt.end_date,
           ttt.day_count,
           ttt.purchase_count
     from   customers c, ttt
    where  
    c.customer_id = ttt.customer_id;
客户ID联系我们公司简介开始日期结束日期天数采购_计数
丽莎萨拉迪诺2023年4月14日00:00:002023 - 4 - 24 00:00:00十一十一
迈克尔帕尔梅什2022年3月2日00时00分2022年3月16日00时00分十五岁十五岁
迈克尔帕尔梅什2023年4月22日00:00:002023年5月14日00:00:00二十三二十三
四个约瑟夫扎扎2023年1月1日00:00:002023年1月13日00:00:00十三六十

fiddle
我建议使用更简单的方法,如@astentx或@MTO的答案所示。

k97glaaz

k97glaaz2#

可以使用MATCH_RECOGNIZE

SELECT p.customer_id,
       c.first_name,
       c.last_name,
       TRUNC(p.first_purchase) AS first_day,
       TRUNC(p.last_purchase) - TRUNC(p.first_purchase) + 1 AS day_count,
       p.purchase_count
FROM   purchases
       MATCH_RECOGNIZE(
         PARTITION BY customer_id
         ORDER BY purchase_date
         MEASURES
           FIRST(purchase_date) AS first_purchase,
           LAST(purchase_date) AS last_purchase,
           COUNT(*) AS purchase_count
         PATTERN (consecutive_days+ final_day)
         DEFINE consecutive_days AS NEXT(purchase_date) < TRUNC(purchase_date) + INTERVAL '2' DAY
       ) p
       INNER JOIN customers c
       ON c.customer_id = p.customer_id
WHERE  p.last_purchase >= TRUNC(p.first_purchase) + 9

对于样本数据,输出:
| 客户ID|联系我们|公司简介|第一天|天数|采购_计数|
| --------------|--------------|--------------|--------------|--------------|--------------|
| 二|丽莎|萨拉迪诺|2019 -04-14 00:00:00|十一|十一|
| 三|迈克尔|帕尔梅什|2019 -03- 22 00:00:00|十五岁|十五岁|
| 三|迈克尔|帕尔梅什|2019 -04-22 00:00:00|二十三|二十三|
| 四个|约瑟夫|扎扎|2019 -01- 21 00:00:00|十三|六十|
如果你真的想使用分析函数和聚合来完成它,那么上面的查询可以转换为:

WITH continuous as (
  SELECT customer_id,
         purchase_date,
         CASE
         WHEN purchase_date < TRUNC(LAG(purchase_date) OVER (PARTITION BY customer_id ORDER BY purchase_date)) + 2
         THEN 0
         ELSE 1
         END AS is_not_continuous
  FROM   purchases p
),
groups as (
  SELECT customer_id,
         purchase_date,
         SUM(is_not_continuous) OVER (PARTITION BY customer_id ORDER BY purchase_date)
          AS grp
 FROM   continuous c
),
totals AS (
  SELECT customer_id,
         TRUNC(MIN(purchase_date)) AS first_day,
         1 + TRUNC(MAX(purchase_date)) - TRUNC(MIN(purchase_date)) AS day_count,
         COUNT(*) AS purchase_count
  FROM   groups
  GROUP BY
         customer_id,
         grp
  HAVING 1 + TRUNC(MAX(purchase_date)) - TRUNC(MIN(purchase_date)) >= 10
)
SELECT c.customer_id,
       c.first_name,
       c.last_name,  
       t.first_day,
       t.day_count,
       t.purchase_count
FROM   customers c
       INNER JOIN totals t
       ON c.customer_id = t.customer_id;

具有相同的输出。
fiddle

k0pti3hp

k0pti3hp3#

您可以使用上述match_recognize并定义匹配模式为 * 在当前行之后的一天内有下一次购买 * 重复9次或更多:第一行应该匹配没有条件(因为我们没有规则的开始系列),然后我们需要有至少9行1天的差距。
下面是purchases表上的代码。该结果可以被联接到其他表以向标识符添加标签等。

with distinct_days as (
  /*Group by day first to count entire days*/
  select
    customer_id,
    product_id,
    trunc(purchase_date) as purchase_date,
    sum(quantity) as quantity
  from purchases p
  group by 
    customer_id,
    product_id,
    trunc(purchase_date)
)
select *
from distinct_days
match_recognize (
  partition by customer_id
  order by purchase_Date asc
  measures
    first(purchase_Date) as first_date,
    last(purchase_Date) as last_Date,
    match_number() as grp,
    count(purchase_Date) as days_,
    sum(quantity) as quantity
--  all rows per match
  pattern(any_ next_day{9,})
  define
    next_day as trunc(purchase_Date) <= trunc(prev(purchase_Date)) + 1
)
order by customer_id, grp
客户ID首个日期最后日期玻璃钢天_数量
2023-04-14 2023-04-14 2023-04-142023-04-24 2023-04-241十一二十二
2022-03-02 2022-03-022022年3月16日1十五岁十五岁
2023-04-22 2023-04-222023-05-14 2023-05-14 2023-05-14二十三二十三
四个2023-01-01 2023-01-012023-01-13 2023-01-13 2023-01-131十三六十

fiddle

holgip5t

holgip5t4#

这是另一种受DENSE_RANK()启发的方法
dbFiddle Link

-- working 2
        
    with prank as
    (
      select dense_Rank() over (partition by customer_id 
      order by to_number(to_char(purchase_date, 'J'))) - 
      to_number(to_char(purchase_date, 'J')) as recRank, 
      purchase_date, customer_id 
      FROM purchases
    ), 
      prankdays as
    (select customer_id, trunc(max(purchase_date)) - 
               trunc(min(purchase_date)) + 1 as numRunningDays
      min(purchase_date), 
      max(purchase_date)        
    from prank
    group by customer_id, recRank
    having (extract(day from max(purchase_date) - min(purchase_date)) + 1) >=10
    )
    select c.first_name, c.last_name, p.*  
    from prankdays p, customers c 
    where p.customer_id = c.customer_id 
     
    
    -- notes:
    -- 1 extract(day from max(purchase_date) - min(purchase_date)) + 1
    --     doesn't include, therefore trunc is used where both start, end dates inclusive
    -- 2 count(*) records is not applied
    -- 3 dense rank works great for identifying gaps
    -- 4 there are two ways to get number of days between two dates here
    -- 5 a way to convert date to number is shown here
    -- I used the same dbfiddle table schema, data by @AlexPoole (cheers)
    ;

输出:

FIRST_NAME  LAST_NAME   CUSTOMER_ID NUMRUNNINGDAYS  MIN(PURCHASE_DATE)          MAX(PURCHASE_DATE)
Lisa        Saladino    2           11              14-APR-23 01.00.00.000000   24-APR-23 11.00.00.000000
Micheal     Palmice     3           15              02-MAR-22 00.23.05.000000   16-MAR-22 05.46.15.000000
Micheal     Palmice     3           23              22-APR-23 23.00.00.000000   14-MAY-23 01.00.00.000000
Joseph      Zaza        4           13              01-JAN-23 05.00.00.000000   13-JAN-23 12.00.00.000000

一些思考的食物:
Oracle documentation
DENSE_RANK()是Oracle中的一个分析函数,用于计算一组有序行中的一行的排名。返回的秩是从1开始的整数。
与RANK()函数不同,DENSE_RANK()函数将秩值作为连续整数返回。在平局的情况下,它不会跳过排名。具有相同等级标准值的行将收到相同的等级值。

相关问题