SQL Server Partitioning and grouping

fcwjkofz  于 2023-04-28  发布在  其他
关注(0)|答案(3)|浏览(159)

I am trying to calculate something in Power BI or SQL, as this is conserning many many millions of rows, performance leads me to doing this in SQL, But i've been trying to get this work for quite a while already!

I am trying to make a calculate column that shows the current state of an account
| AccountId | PartnerId | Transactions | FirstTransaction | row_num | Industry | Private/Public | Result |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 6906 | 19 | 557 | 01-Jan-22 | 1 | Financial services | Private | Private Only |
| 6906 | 20 | 557 | 02-Jan-22 | 2 | Financial services | Private | Private Only |
| 6906 | 32 | 30 | 05-Jan-22 | 3 | Government | Public | Public + Private |
| 6906 | 738 | 2 | 12-Jan-22 | 4 | Government | Public | Public + Private |
| 6906 | 82 | 5 | 13-Jan-22 | 5 | | Private | Public + Private |
| 6906 | 74 | 1 | 11-Jun-22 | 6 | Financial services | Private | Public + Private |
| 6906 | 56 | 1 | 27-Jun-22 | 7 | Insurance | Private | Public + Private |

The result should check if an account has Public or Private transactions, in order of occurrence. I added a rank, partitioned by accountid and ordered by date ( = row_num) But next I would want a column that can crosscheck all the values within a partition.

In this example above, This account has the label "Private Only" on 01-01 as it only has done transactions with a private partner. the second line also says Private only, as it still has only done transactions with a private partner. But as of the third line it changes to Public + Private as it also used public transactions.

Anyone an idea how I can calcualte that Result column?

Edit: My Original Query:

SELECT ua.[AccountId],
      ua.[PartnerId],
      i.industry,
      i.[Public/Private],
      COUNT(*) AS Transactions,
      MIN(ua.[FilepathDate]) AS FirstTransaction,
      ROW_NUMBER() OVER (PARTITION BY accountid
                         ORDER BY MIN(ua.[FilepathDate] )) 
Partner_row_num

 FROM [dbo].[incremental_UserActions] ua
 INNER JOIN tmp_Industry i ON ua.partnerid = i.partnerid
 WHERE YEAR(ua.[FilepathDate]) = 2022
   AND ua.accountid = 6906
 GROUP BY ua.[AccountId],
        ua.[PartnerId],
        i.industry,
        i.[Public/Private]

Edit 2 : the eventual output of the above looks like this :

| AccountId | FirstTransaction | MonthAgg   | Result            |
| --------- | ---------------- | ---------- | ----------------- |
| 7015      | 05/01/2022       | 01/01/2022 | FAS Only          |
| 7015      | 06/01/2022       | 01/01/2022 | FAS + Other Public|
| 7015      | 19/01/2022       | 01/01/2022 | Public + Private  |
| 7015      | 19/01/2022       | 01/01/2022 | Public + Private  |
| 7015      | 24/01/2022       | 01/01/2022 | Public + Private  |
| 7015      | 08/04/2022       | 01/04/2022 | Public + Private  |
| 7015      | 08/06/2022       | 01/06/2022 | Public + Private  |
| 7799      | 04/01/2022       | 01/01/2022 | Private Only      |
| 7799      | 10/01/2022       | 01/01/2022 | Public + Private  |
| 7799      | 08/03/2022       | 01/03/2022 | Public + Private  |
| 7799      | 08/04/2022       | 01/04/2022 | Public + Private  |
| 7799      | 17/12/2022       | 01/12/2022 | Public + Private  |

The final result should have consecutive months per accountid. so 1 line per account per month

| AccountId | FirstTransaction | MonthAgg   | Result           |
| --------- | ---------------- | ---------- | ---------------- |
| 7015      | 24/01/2022       | 01/01/2022 | Public + Private |
| 7015      | 24/01/2022       | 01/02/2022 | Public + Private |
| 7015      | 24/01/2022       | 01/03/2022 | Public + Private |
| 7015      | 08/04/2022       | 01/04/2022 | Public + Private |
| 7015      | 08/04/2022       | 01/05/2022 | Public + Private |
| 7015      | 08/06/2022       | 01/06/2022 | Public + Private |
| 7015      | 08/06/2022       | 01/07/2022 | Public + Private |
| 7015      | 08/06/2022       | 01/08/2022 | Public + Private |
| 7015      | 08/06/2022       | 01/09/2022 | Public + Private |
| 7015      | 08/06/2022       | 01/10/2022 | Public + Private |
| 7015      | 08/06/2022       | 01/11/2022 | Public + Private |
| 7015      | 08/06/2022       | 01/12/2022 | Public + Private |
| 7799      | 10/01/2022       | 01/01/2022 | Public + Private |
| 7799      | 10/01/2022       | 01/02/2022 | Public + Private |
| 7799      | 08/03/2022       | 01/03/2022 | Public + Private |
| 7799      | 08/04/2022       | 01/04/2022 | Public + Private |
| 7799      | 08/04/2022       | 01/05/2022 | Public + Private |
| 7799      | 08/04/2022       | 01/06/2022 | Public + Private |
| 7799      | 08/04/2022       | 01/07/2022 | Public + Private |
| 7799      | 08/04/2022       | 01/08/2022 | Public + Private |
| 7799      | 08/04/2022       | 01/09/2022 | Public + Private |
| 7799      | 08/04/2022       | 01/10/2022 | Public + Private |
| 7799      | 08/04/2022       | 01/11/2022 | Public + Private |
| 7799      | 17/12/2022       | 01/12/2022 | Public + Private |
jv4diomz

jv4diomz1#

You can use window functions. If there are only two possible values in the [Public/Private] column, then, starting from your existing query:

SELECT ua.[AccountId],
    ua.[PartnerId],
    i.industry,
    i.[Public/Private],
    COUNT(*) AS Transactions,
    MIN(ua.[FilepathDate]) AS FirstTransaction,
    ROW_NUMBER() OVER (PARTITION BY accountid ORDER BY MIN(ua.[FilepathDate] )) Partner_row_numn,
    CASE WHEN MIN(i.[Public/Private]) OVER((PARTITION BY accountid ORDER BY MIN(ua.[FilepathDate]))
            = MAX(i.[Public/Private]) OVER((PARTITION BY accountid ORDER BY MIN(ua.[FilepathDate]))
        THEN concat( MIN(i.[Public/Private]) OVER((PARTITION BY accountid ORDER BY MIN(ua.[FilepathDate])), ' Only')
        ELSE 'Public + Private'
    END AS Result
FROM [dbo].[incremental_UserActions] ua
INNER JOIN tmp_Industry i ON ua.partnerid = i.partnerid
WHERE YEAR(ua.[FilepathDate]) = 2022 AND ua.accountid = 6906
GROUP BY ua.[AccountId], ua.[PartnerId], i.industry, i.[Public/Private]

The idea is to compare the MIN and MAX values; if they are the same, then it means there is only one distinct value in the window - else there are two different values, so we just use the mixed tag.

bjg7j2ky

bjg7j2ky2#

Query:

with cte as
( 
  select *,
  (case when lag(private_public)over(partition by accountid order by row_num)<>private_public then 1 else 0 end)status
  from accounts
)
select AccountId, PartnerId, Transactions, FirstTransaction, row_num, Industry, Private_Public,(case when sum(status)over(partition by accountid order by row_num)>0 then 'Private + Public' else Private_Public +' only' end) Result from cte

Output:

AccountIdPartnerIdTransactionsFirstTransactionrow_numIndustryPrivate_PublicResult
6906195572022-01-011Financial servicesPrivatePrivate only
6906205572022-01-022Financial servicesPrivatePrivate only
690632302022-01-053GovernmentPublicPrivate + Public
690673822022-01-124GovernmentPublicPrivate + Public
69068252022-01-135nullPrivatePrivate + Public
69067412022-06-116Financial servicesPrivatePrivate + Public
69065612022-06-277InsurancePrivatePrivate + Public

fiddle

5m1hhzi4

5m1hhzi43#

I have created a months table and inserted 12 months in it. Then I have joined it with previous result. No recursion is needed. Please check.

CREATE TABLE accounts (
    AccountId INT NOT NULL,
    PartnerId INT NOT NULL,
    Transactions INT NOT NULL,
    FirstTransaction DATE NOT NULL,
    row_num INT NOT NULL,
    Industry VARCHAR(50),
    Private_Public VARCHAR(10) NOT NULL    
);
INSERT INTO accounts (AccountId, PartnerId, Transactions, FirstTransaction, row_num, Industry, Private_Public)
VALUES 
    (6905, 19, 557, '2022-01-01', 1, 'Financial services', 'Private'),
    (6906, 20, 557, '2022-01-02', 2, 'Financial services', 'Private'),
    (6906, 32, 30, '2022-01-05', 3, 'Government', 'Public'),
    (6906, 738, 2, '2022-01-12', 4, 'Government', 'Public'),
    (6906, 82, 5, '2022-01-13', 5, NULL, 'Private'),
    (6905, 74, 1, '2022-06-11', 6, 'Financial services', 'public'),
    (6906, 56, 1, '2022-06-27', 7, 'Insurance', 'Private');
create table months(monthId int,name varchar(50),first_day date);
INSERT INTO months (monthId, name, first_day) VALUES
(1, 'January', '2023-01-01'),
(2, 'February', '2023-02-01'),
(3, 'March', '2023-03-01'),
(4, 'April', '2023-04-01'),
(5, 'May', '2023-05-01'),
(6, 'June', '2023-06-01'),
(7, 'July', '2023-07-01'),
(8, 'August', '2023-08-01'),
(9, 'September', '2023-09-01'),
(10, 'October', '2023-10-01'),
(11, 'November', '2023-11-01'),
(12, 'December', '2023-12-01');
Query:

with cte as
( 
  select *,
  (case when lag(private_public)over(partition by accountid order by row_num)<>private_public then 1 else 0 end)status,
  row_number()over(partition by accountId,month(FirstTransaction) order by FirstTransaction desc)ReverseTranSeq
  from accounts
)
,finalCte as
(
  select  AccountId, PartnerId, Transactions, FirstTransaction, row_num, Industry, Private_Public,(case when sum(status)over(partition by accountid order by row_num)>0 then 'Private + Public' else Private_Public +' only' end) Result 
  from cte 
  where ReverseTranSeq=1
)
  select m.AccountId, 
  max(FirstTransaction) over(partition by m.AccountId order by first_day  rows between unbounded preceding and current row)FirstTransaction,
  first_day MonthAgg,
  last_value(Result)ignore nulls over(partition by m.accountId order by first_day rows between unbounded preceding and current row)Result
from months cross join (select distinct AccountId from cte  )m
left join finalCte 
  on finalCte.AccountId=m.AccountId and months.monthId=month(finalCte.FirstTransaction)  
order by m.AccountId,months.monthId

OUtput:

AccountIdFirstTransactionMonthAggResult
69052022-01-012023-01-01Private only
69052022-01-012023-02-01Private only
69052022-01-012023-03-01Private only
69052022-01-012023-04-01Private only
69052022-01-012023-05-01Private only
69052022-06-112023-06-01Private + Public
69052022-06-112023-07-01Private + Public
69052022-06-112023-08-01Private + Public
69052022-06-112023-09-01Private + Public
69052022-06-112023-10-01Private + Public
69052022-06-112023-11-01Private + Public
69052022-06-112023-12-01Private + Public
69062022-01-132023-01-01Private + Public
69062022-01-132023-02-01Private + Public
69062022-01-132023-03-01Private + Public
69062022-01-132023-04-01Private + Public
69062022-01-132023-05-01Private + Public
69062022-06-272023-06-01Private + Public
69062022-06-272023-07-01Private + Public
69062022-06-272023-08-01Private + Public
69062022-06-272023-09-01Private + Public
69062022-06-272023-10-01Private + Public
69062022-06-272023-11-01Private + Public
69062022-06-272023-12-01Private + Public
[fiddle](https://dbfiddle.uk/aUd6k79t)

相关问题