在数据集中,有用户沿着他们从不同公司购买的产品和购买日期。(其他栏目是无关的,现在。)我试图确定用户和他们的购买日期,用户养成了连续几天订购的习惯(从一家公司在多天是坏的,从不同的公司在不同的日子订购是可以的),而不是一次订购所有的东西,以保存运费等。
例如,如果他们在2023.08.23、2023.08.24和2023.08.25上订购,而不是在2023.08.23、2023.08.23和2023.08.23上一次订购所有内容。
我试着在我的数据集上运行这段代码。但它显示了一个不同的输出,我想看到的。你能修改一下我的代码吗?是否有可能以某种方式标记那些日子,在原始表中的用户?这是我的计划与旗帜在我以前张贴的代码(为进一步调查)-图片补充。
WITH t2 AS (
SELECT USERT, CREATIONDATE, COMPANY ,
LAG(creationdate) OVER(PARTITION BY USERT ORDER BY USERT) AS prev_diff,
LEAD(creationdate) OVER(PARTITION BY USERT ORDER BY USERT) AS next_diff,
LAG(USERT) OVER(ORDER BY USERT) AS prev_USERT,
LEAD(USERT) OVER(ORDER BY USERT) AS next_USERT
FROM VEKPOLFA1P72_TESZT_P
)
SELECT USERT, CREATIONDATE, COMPANY,
CASE
WHEN ((prev_diff IS NULL OR creationdate - prev_diff <= 1) AND USERT = prev_USERT) OR
((next_diff IS NULL OR next_diff - creationdate <= 1) AND USERT = next_USERT) THEN 'Consecutive'
ELSE 'Non-Consecutive'
END AS consecutive_marker
FROM t2
WHERE creationdate >= '20230601' AND creationdate <= '20230630'
数据集示例:
USERT;CREATIONDATE;COMPANY
43014502;20230605;SCHKFT
43014503;20230605;EURFT.
43014509;20230606;HORANS
43014516;20230607;EURFT.
43014516;20230607;EURFT.
43014522;20230620;HORANS
43014523;20230623;GHII K
43014524;20230624;EURFT.
43014533;20230603;HORANS
43014534;20230629;GHII K
45921390;20230629;NANREC
45921390;20230628;NANREC
45921390;20230630;NANREC
45931996;20230630;BEYECT
49117108;20230613;BEYECT
49148157;20230612;D E BT
49148163;20230612;STAFT.
49148165;20230615;MENFT.
49148165;20230615;MENFT.
49148167;20230604;INGMBH
49148167;20230605;INGMBH
49148167;20230606;INGMBH
49148168;20230601;GUT KG
49148174;20230620;PAPRT.
49148174;20230620;FRT.
49148174;20230620;PAPRT.
49148175;20230601;PANOPE
49148175;20230602;FAE
49148175;20230605;PANOPE
49148175;20230605;PANOPE
49148175;20230605;PANOPE
49148179;20230621;GK LGA
49148179;20230622;GK LGA
49148179;20230623;GK LGA
49148179;20230624;GK LGA
49148183;20230601;SCHMBH
49148183;20230601;SCHMBH
49148183;20230630;SCHMBH
预期输出:
USERT;CREATIONDATE;COMPANY
45921390;20230629;NANREC
45921390;20230628;NANREC
45921390;20230630;NANREC
49148167;20230604;INGMBH
49148167;20230605;INGMBH
49148167;20230606;INGMBH
49148179;20230621;GK LGA
49148179;20230622;GK LGA
49148179;20230623;GK LGA
49148179;20230624;GK LGA
...
1条答案
按热度按时间e4eetjau1#
From Oracle 12, you can use
MATCH_RECOGNIZE
to perform row-by-row pattern matching:Or, in earlier versions:
Which, for the sample data:
Both output:
| USERT | CREATIONDATE | CONSECUTIVE_MARKER | COMPANY |
| ------------ | ------------ | ------------ | ------------ |
| 43014502 | 2023-06-05 00:00:00 | Non-consecutive | SCHKFT |
| 43014503 | 2023-06-05 00:00:00 | Non-consecutive | EURFT. |
| 43014509 | 2023-06-06 00:00:00 | Non-consecutive | HORANS |
| 43014516 | 2023-06-07 00:00:00 | Consecutive | EURFT. |
| 43014516 | 2023-06-07 00:00:00 | Consecutive | EURFT. |
| 43014522 | 2023-06-20 00:00:00 | Non-consecutive | HORANS |
| 43014523 | 2023-06-23 00:00:00 | Non-consecutive | GHII K |
| 43014524 | 2023-06-24 00:00:00 | Non-consecutive | EURFT. |
| 43014533 | 2023-06-03 00:00:00 | Non-consecutive | HORANS |
| 43014534 | 2023-06-29 00:00:00 | Non-consecutive | GHII K |
| 45921390 | 2023-06-28 00:00:00 | Consecutive | NANREC |
| 45921390 | 2023-06-29 00:00:00 | Consecutive | NANREC |
| 45921390 | 2023-06-30 00:00:00 | Consecutive | NANREC |
| 45931996 | 2023-06-30 00:00:00 | Non-consecutive | BEYECT |
| 49117108 | 2023-06-13 00:00:00 | Non-consecutive | BEYECT |
| 49148157 | 2023-06-12 00:00:00 | Non-consecutive | D E BT |
| 49148163 | 2023-06-12 00:00:00 | Non-consecutive | STAFT. |
| 49148165 | 2023-06-15 00:00:00 | Consecutive | MENFT. |
| 49148165 | 2023-06-15 00:00:00 | Consecutive | MENFT. |
| 49148167 | 2023-06-04 00:00:00 | Consecutive | INGMBH |
| 49148167 | 2023-06-05 00:00:00 | Consecutive | INGMBH |
| 49148167 | 2023-06-06 00:00:00 | Consecutive | INGMBH |
| 49148168 | 2023-06-01 00:00:00 | Non-consecutive | GUT KG |
| 49148174 | 2023-06-20 00:00:00 | Consecutive | PAPRT. |
| 49148174 | 2023-06-20 00:00:00 | Consecutive | PAPRT. |
| 49148174 | 2023-06-20 00:00:00 | Consecutive | FRT. |
| 49148175 | 2023-06-01 00:00:00 | Consecutive | PANOPE |
| 49148175 | 2023-06-02 00:00:00 | Consecutive | FAE |
| 49148175 | 2023-06-05 00:00:00 | Consecutive | PANOPE |
| 49148175 | 2023-06-05 00:00:00 | Consecutive | PANOPE |
| 49148175 | 2023-06-05 00:00:00 | Consecutive | PANOPE |
| 49148179 | 2023-06-21 00:00:00 | Consecutive | GK LGA |
| 49148179 | 2023-06-22 00:00:00 | Consecutive | GK LGA |
| 49148179 | 2023-06-23 00:00:00 | Consecutive | GK LGA |
| 49148179 | 2023-06-24 00:00:00 | Consecutive | GK LGA |
| 49148183 | 2023-06-01 00:00:00 | Consecutive | SCHMBH |
| 49148183 | 2023-06-01 00:00:00 | Consecutive | SCHMBH |
| 49148183 | 2023-06-30 00:00:00 | Non-consecutive | SCHMBH |
fiddle