SQL Server Query to group by month BUT count only 1 per day

5jdjgkvh  于 2024-01-05  发布在  其他
关注(0)|答案(4)|浏览(137)

I have a log table over all kind of events have been logged during the past years. Is it possible to create a sql query which tells me how many events have been logged per month, BUT only counting maximum one event per day?

Example for a few dates in October:

2022-10-12 12:11:32.570
2022-10-12 12:11:44.050
2022-10-12 14:19:55.027
2022-10-12 14:20:07.763
2022-10-26 09:54:39.020
2022-10-26 09:54:49.333
2022-10-26 09:54:54.353
2022-10-26 09:55:53.030
2022-10-26 09:55:59.363
2022-10-26 09:56:28.783
2022-10-26 09:56:36.797
2022-10-26 09:57:01.200
2022-10-26 10:00:21.663
2022-10-26 10:00:26.820
2022-10-26 10:00:42.777
2022-10-26 10:00:50.520
2022-10-26 10:01:20.777
2022-10-26 10:02:14.833
2022-10-26 10:02:23.460
2022-10-27 08:57:02.830
2022-10-27 08:57:33.820
2022-10-27 08:59:02.333

These should be 3 events (because of 3 different days). And I would like to have an overall result like:

monthevents-count
January15
February2
March...
  1. There are indeed values for several different years and I need the values for each month for the past 3 years.
  2. Is it possible to have the results as COLUMNS? Like:
    | December 2022 | January 2023 | February 2023 | ... |
    | ------------ | ------------ | ------------ | ------------ |
    | 13 | 7 | 18 | ... |
8hhllhi2

8hhllhi21#

You can group dates by month and then count distinct days. This should work.

SELECT MONTH([Column]) as 'Month', COUNT(DISTINCT DAY([Column])) as 'Events count' 
FROM [Table]
GROUP BY MONTH([Column])

UPDATE: Updated version with different years:

SELECT DATENAME(MONTH, [Column]) + ' ' + CAST(YEAR([Column]) AS nvarchar) as 'Date', COUNT(DISTINCT DAY([Column])) AS 'Events count'
FROM [Table]
GROUP BY DATENAME(MONTH, [Column]), YEAR([Column])

As a result you get:

DateEvents count
October 20223
November 20232
October 20234
avkwfej4

avkwfej42#

I would just partition it by Month/Day, do a row count for each combination and then count the first row of each group.

Something like this:

CREATE TABLE #tmp(DT datetime)
    INSERT INTO #tmp VALUES
    ('2022-10-12 12:11:32.570'),
    ('2022-10-12 12:11:44.050'),
    ('2022-10-12 14:19:55.027'),
    ('2022-10-12 14:20:07.763'),
    ('2022-10-26 09:54:39.020'),
    ('2022-10-26 09:54:49.333'),
    ('2022-10-26 09:54:54.353'),
    ('2022-10-26 09:55:53.030'),
    ('2022-10-26 09:55:59.363'),
    ('2022-10-26 09:56:28.783'),
    ('2022-10-26 09:56:36.797'),
    ('2022-10-26 09:57:01.200'),
    ('2022-10-26 10:00:21.663'),
    ('2022-10-26 10:00:26.820'),
    ('2022-10-26 10:00:42.777'),
    ('2022-10-26 10:00:50.520'),
    ('2022-10-26 10:01:20.777'),
    ('2022-10-26 10:02:14.833'),
    ('2022-10-26 10:02:23.460'),
    ('2022-10-27 08:57:02.830'),
    ('2022-10-27 08:57:33.820'),
    ('2022-10-27 08:59:02.333')
    
SELECT YEAR(EOM) YR ,DATENAME(Month,EOM) Month_Name, COUNT(*) Event_Cnt
FROM
(
    select EOMONTH(dt) EOM, ROW_NUMBER() OVER (PARTITION BY CAST(dt as date) ORDER BY dt) RN
    from #tmp
) DList WHERE DList.RN = 1
GROUP BY Year(EOM),DATENAME(Month,EOM)
mwkjh3gx

mwkjh3gx3#

A possible approach for the initial question is a statement, using COUNT with DISTINCT and an appropriate GROUP BY clause:

SELECT
   CONVERT(varchar(6), DateTimeColumn, 112) AS [Period],
   COUNT(DISTINCT DAY(DateTimeColumn)) AS [EventCount]
FROM Data
GROUP BY CONVERT(varchar(6), DateTimeColumn, 112)

A possible approach for your final question (... to get the count of the events for each month for the past 3 years and rotate them as columns ...) is to PIVOT the values using COUNT as aggregation function to get the DISTINCT days for each period (a combination of year and month). You have at least these options:

  • A static PIVOT , if you want to count the events for known periods (202312, 202311, 202310, ....).
  • A static PIVOT , if you want to count the events for relative periods (current period is 0, previous period is 1, ...).
  • A dynamic PIVOT , if you want to include a variable number of periods, starting from current period. The approach uses a numbers table, build with recursive query in this specific case, but you may always choose a different technique:

Sample data:

SELECT *
INTO Data  
FROM (VALUES
   (CONVERT(datetime, '2023-12-21T12:11:32.570')),
   (CONVERT(datetime, '2023-11-12T12:11:32.570')),
   (CONVERT(datetime, '2023-11-12T12:11:33.560')),
   (CONVERT(datetime, '2023-10-12T12:11:32.570')),
   (CONVERT(datetime, '2023-10-12T12:11:44.050')),
   (CONVERT(datetime, '2023-10-26T09:57:01.200')),
   (CONVERT(datetime, '2023-10-26T10:00:21.663')),
   (CONVERT(datetime, '2023-10-26T10:00:26.820')),
   (CONVERT(datetime, '2023-10-27T08:57:33.820')),
   (CONVERT(datetime, '2023-10-27T08:59:02.333'))
) v (DateTimeColumn)

Static PIVOT for known periods:

SELECT *
FROM (  
   SELECT DISTINCT 
      CONVERT(varchar(6), DateTimeColumn, 112) AS [Period],
      DAY(DateTimeColumn) AS [Day]
   FROM Data  
) t
PIVOT (
   COUNT([Day]) FOR [Period] IN ([202307], [202308], [202309], [202310], [202311], [202312])
) p

Static PIVOT for relative periods:

SELECT *
FROM (  
   SELECT DISTINCT 
      DATEDIFF(month, EOMONTH(DateTimeColumn), EOMONTH(GETDATE())) AS [RelativePeriod],
      DAY(DateTimeColumn) AS [Day]
   FROM Data  
) t
PIVOT (
   COUNT([Day]) FOR [RelativePeriod] IN ([0], [1], [2], [3], [4], [5])
) p

Dymanic PIVOT for variable number of years:

DECLARE @years int = 3
DECLARE @periods nvarchar(max)
  
; WITH PeriodsCTE AS (
   SELECT 1 AS N, CONVERT(varchar(6), DATEADD(month, 0, GETDATE()), 112) AS Period
   UNION ALL
   SELECT N + 1, CONVERT(varchar(6), DATEADD(month, -N, GETDATE()), 112)
   FROM PeriodsCTE 
   WHERE N < @years * 12
)
SELECT @periods = STRING_AGG(N'[' + Period + N']', N',') WITHIN GROUP (ORDER BY Period DESC)
FROM PeriodsCTE
OPTION (MAXRECURSION 0);

DECLARE @stmt nvarchar(max) =
   N'SELECT * ' +
   N'FROM ( ' +  
      N'SELECT DISTINCT ' + 
         N'CONVERT(varchar(6), DateTimeColumn, 112) AS [Period], ' +
         N'DAY(DateTimeColumn) AS [Day] ' +
      N'FROM Data ' +  
   N') t ' +
   N'PIVOT ( ' +
      N'COUNT([Day]) FOR [Period] IN (' + @periods + N') ' +
   N') p '

DECLARE @err int
EXEC @err = sp_executesql @stmt
IF @err <> 0 PRINT 'Error'

Results:

202312202311202310202309202308202307
113000
012345
113000
202312202311202310202309202308202307...202101
------------------------------------------------------------------------------------------------
113000...0
oprakyz7

oprakyz74#

You have answers, but if you wanted to return the data in columns (by years) instead of rows either of these could work. Includes columns for 0 values too, so that may not be desired.

Note: For dynamic changing headers, I believe you would need to write some dynamic SQl.

Pivot Example

SELECT
    pvt.[Year],
    pvt.[1] AS [January],
    pvt.[2] AS [February],
    pvt.[3] AS [March],
    pvt.[4] AS [April],
    pvt.[5] AS [May],
    pvt.[6] AS [June],
    pvt.[7] AS [July],
    pvt.[8] AS [August],
    pvt.[9] AS [September],
    pvt.[10] AS [October],
    pvt.[11] AS [November],
    pvt.[12] AS [December]
FROM (
    select distinct
        DATEPART(YEAR, event_date) [Year],
        DATEPART(MONTH, event_date) [Month],
        CAST(CAST(event_date AS DATE) AS NVARCHAR) [YearMonthDay]
    from #tmp_events
) AS sub
PIVOT (
    COUNT([YearMonthDay]) for [Month] in([1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12])
) pvt

Cross Apply Example (could use outer too)

SELECT
    sub.[year],
    pvt.*
FROM (
    SELECT
        DATEPART(YEAR, event_date) [Year]
    FROM #tmp_events
    GROUP BY DATEPART(YEAR, event_date)
) AS sub
CROSS APPLY (
    SELECT 
        COUNT(DISTINCT CASE WHEN DATEPART(Month, event_date) = '01' THEN DATEPART(DAY, t.event_date) END) [January],
        COUNT(DISTINCT CASE WHEN DATEPART(Month, event_date) = '02' THEN DATEPART(DAY, t.event_date) END) [February],
        COUNT(DISTINCT CASE WHEN DATEPART(Month, event_date) = '03' THEN DATEPART(DAY, t.event_date) END) [March],
        COUNT(DISTINCT CASE WHEN DATEPART(Month, event_date) = '04' THEN DATEPART(DAY, t.event_date) END) [April],
        COUNT(DISTINCT CASE WHEN DATEPART(Month, event_date) = '05' THEN DATEPART(DAY, t.event_date) END) [May],
        COUNT(DISTINCT CASE WHEN DATEPART(Month, event_date) = '06' THEN DATEPART(DAY, t.event_date) END) [June],
        COUNT(DISTINCT CASE WHEN DATEPART(Month, event_date) = '07' THEN DATEPART(DAY, t.event_date) END) [July],
        COUNT(DISTINCT CASE WHEN DATEPART(Month, event_date) = '08' THEN DATEPART(DAY, t.event_date) END) [August],
        COUNT(DISTINCT CASE WHEN DATEPART(Month, event_date) = '09' THEN DATEPART(DAY, t.event_date) END) [September],
        COUNT(DISTINCT CASE WHEN DATEPART(Month, event_date) = '10' THEN DATEPART(DAY, t.event_date) END) [October],
        COUNT(DISTINCT CASE WHEN DATEPART(Month, event_date) = '11' THEN DATEPART(DAY, t.event_date) END) [November],
        COUNT(DISTINCT CASE WHEN DATEPART(Month, event_date) = '12' THEN DATEPART(DAY, t.event_date) END) [December]
    FROM #tmp_events t
    where DATEPART(YEAR, t.event_date) = sub.Year
) pvt

Sample Data

CREATE TABLE #tmp_events(event_date datetime)
INSERT INTO #tmp_events VALUES
--one day feb 2022
('2022-02-12 12:11:32.570'),
('2022-02-12 12:11:44.050'),
--three days oct 2022
('2022-10-12 12:11:32.570'),
('2022-10-12 12:11:44.050'),
('2022-10-12 14:19:55.027'),
('2022-10-12 14:20:07.763'),
('2022-10-26 09:54:39.020'),
('2022-10-26 09:54:49.333'),
('2022-10-26 09:54:54.353'),
('2022-10-26 09:55:53.030'),
('2022-10-26 09:55:59.363'),
('2022-10-26 09:56:28.783'),
('2022-10-26 09:56:36.797'),
('2022-10-26 09:57:01.200'),
('2022-10-26 10:00:21.663'),
('2022-10-26 10:00:26.820'),
('2022-10-26 10:00:42.777'),
('2022-10-26 10:00:50.520'),
('2022-10-26 10:01:20.777'),
('2022-10-26 10:02:14.833'),
('2022-10-26 10:02:23.460'),
('2022-10-27 08:57:02.830'),
('2022-10-27 08:57:33.820'),
('2022-10-27 08:59:02.333'),
--one days Nov 2022
('2022-11-27 08:57:02.830'),
--one days Dec 2023
('2023-11-27 08:57:02.830')

Results

=== Update ===

I wanted to give an example dynamic query, @Zhorov already beat me to this, so consider this an extension of what Zhorov already said.
| February 2022 | October 2022 | November 2022 | November 2023 |
| ------------ | ------------ | ------------ | ------------ |
| 1 | 3 | 1 | 1 |

Uses the monthname/year column naming and only lists results for data that you have. So, yes, it is possible to do. It adds complexity and maintenance weight.

I have commented the code a bit for better understanding.

/*
Limit of 4096 to keep it from breaking on a very wide year and event 
counts. Provides the base grouping, count, and key values for later 
manipulation. Stores the results in a tmp_table.
*/
SELECT DISTINCT TOP 4096 -- max columns for SELECT (https://learn.microsoft.com/en-us/sql/sql-server/maximum-capacity-specifications-for-sql-server?view=sql-server-ver16)
    CAST(DATENAME(MONTH, DATEADD(MONTH, DATEPART(MONTH, event_date), -1)) as nvarchar) + ' ' + CAST(DATEPART(YEAR, event_date) as nvarchar) MonthYearCode
    ,CAST(DATEPART(YEAR, event_date) AS nvarchar) [Year]
    ,CAST(DATEPART(MONTH, event_date) AS nvarchar) [Month]
    ,DATEPART(YEAR, event_date) [YearNumber]
    ,DATEPART(MONTH, event_date) [MonthNumber]
    ,COUNT(DISTINCT CAST(DATEPART(YEAR, event_date) AS nvarchar) 
        + CAST(DATEPART(MONTH, event_date) AS nvarchar) 
        + CAST(DATEPART(DAY, event_date) AS nvarchar)) [Count]
INTO #tmp_data
FROM #tmp_events e
GROUP BY DATEPART(YEAR, event_date), DATEPART(MONTH, event_date)
ORDER BY CAST(DATEPART(YEAR, event_date) AS nvarchar), CAST(DATEPART(MONTH, event_date) AS nvarchar)

--Variables for builing the select and in columns
DECLARE @select_columns AS NVARCHAR(MAX);
DECLARE @in_columns AS NVARCHAR(MAX);

/*
You probably should leave the "pvt" alias off, so that
the alais can be changed without needing an update here. I
have left to be explicit for the example.

This assigns the known values to the desired naming and
builds both the select columns and the in columns.
For older versions of SQL Server you can use stuff()
Instead of STRING_AGG()
*/
SELECT 
    @select_columns = STRING_AGG('pvt.' + QUOTENAME([Year] + [Month]) + ' AS ' + QUOTENAME([MonthYearCode]), N',') WITHIN GROUP (ORDER BY [YearNumber], [MonthNumber])
    ,@in_columns = STRING_AGG(QUOTENAME([Year] + [Month]), N',') WITHIN GROUP (ORDER BY [YearNumber], [MonthNumber])
FROM #tmp_data

DECLARE @sql_string AS NVARCHAR(MAX);
/*
Simple plug and play query. Uses the tmp table created from you source 
datetimes. This pivots on the counts you already have and spreads them 
acrossed the created column header.

With both the in and select built above, you have reasonable flexability
with updating as needed. Also, this only displays data you have, so it
will match you final request.
*/
SET @sql_string = '
SELECT ' + @select_columns + '
FROM (
    SELECT DISTINCT
        CAST([Year] AS NVARCHAR) + CAST([Month] AS NVARCHAR) [YearMonth],  
        [COUNT]     
    from #tmp_data
) AS sub
PIVOT (
    SUM([COUNT]) for [YearMonth] in(' + @in_columns + ')
) pvt';

exec(@sql_string);

相关问题