SQL Server How do I get the sum of a count and group by a column

rqcrx0a6  于 2023-10-15  发布在  其他
关注(0)|答案(2)|浏览(119)

Getting stuck with a query getting the sum of counts.

I have a situation where I have to get every department and sum up all employee tasks

Currently my output is as follows
| Department | Total Tasks | Num Completed | Num Overdue |
| ------------ | ------------ | ------------ | ------------ |
| Asset Management | 0 | 0 | 0 |
| Asset Management | 40 | 0 | 40 |
| Budget & Treasury | 1 | 0 | 1 |
| Budget & Teasury | 13 | 8 | 6 |

Need this below output

DepartmentTotal TasksNum CompletedNum Overdue
Asset Management40040
Budge6 & Teasury1487
DECLARE @DateFrom datetime = '2023-08-01'
DECLARE @DateTo datetime = '2023-09-27'

SELECT DISTINCT(u.Department),
    (SELECT COUNT(pt.to_resource) 
     FROM pcs_task pt WITH (nolock)  
     WHERE p.pcs_user_id = pt.to_resource 
       AND DATEADD(DAY, -30, @DateTo) > pt.date_closed 
       AND pt.due_date <= (@DateTo) 
       AND pt.date_alloc >= (@DateFrom) 
       AND date_closed <= @DateTo 
       AND pt.tsk_status = 'COMPLETE') +
    (SELECT COUNT(pt.to_resource) 
     FROM pcs_task pt WITH (nolock) 
     WHERE p.pcs_user_id = pt.to_resource 
       AND DATEADD(DAY, -60, @DateTo) >  pt.date_closed 
       AND pt.due_date <= (@DateTo) 
       AND pt.date_alloc >= (@DateFrom) 
       AND date_closed <= @DateTo 
       AND pt.tsk_status = 'COMPLETE') +
    (SELECT COUNT(pt.to_resource) 
     FROM pcs_task pt WITH (nolock) 
     WHERE p.pcs_user_id = pt.to_resource 
       AND DATEADD(DAY, 30, @DateTo) >= pt.date_alloc 
       AND pt.due_date <= (@DateTo) 
       AND pt.date_alloc >= (@DateFrom) 
       AND pt.tsk_status <> 'COMPLETE') + 
    (SELECT COUNT(pt.to_resource) 
     FROM pcs_task pt WITH (nolock) 
     WHERE p.pcs_user_id = pt.to_resource 
       AND DATEADD(DAY, 60, pt.due_date) >= pt.date_alloc 
       AND pt.due_date <= (@DateTo) 
       AND pt.date_alloc >= (@DateFrom) 
       AND pt.tsk_status <> 'COMPLETE') AS [Total Tasks],
    (SELECT COUNT(pt.to_resource) 
     FROM pcs_task pt WITH (nolock) 
     WHERE p.pcs_user_id = pt.to_resource 
       AND DATEADD(DAY, -30, @DateTo) > pt.date_closed 
       AND pt.due_date <= (@DateTo) 
       AND pt.date_alloc >= (@DateFrom) 
       AND date_closed <= @DateTo 
       AND pt.tsk_status = 'COMPLETE') +
    (SELECT COUNT(pt.to_resource) 
     FROM pcs_task pt WITH (nolock) 
     WHERE p.pcs_user_id = pt.to_resource 
       AND DATEADD(DAY, -60, @DateTo) > pt.date_closed 
       AND pt.due_date <= (@DateTo) 
       AND pt.date_alloc >= (@DateFrom) 
       AND date_closed <= @DateTo 
       AND pt.tsk_status = 'COMPLETE') AS [Num Completed],
    (SELECT COUNT(pt.to_resource) 
     FROM pcs_task pt WITH (nolock) 
     WHERE p.pcs_user_id = pt.to_resource 
       AND DATEADD(DAY, 30, @DateTo) >= pt.date_alloc 
       AND pt.due_date <= (@DateTo) 
       AND pt.date_alloc >= (@DateFrom) 
       AND pt.tsk_status <> 'COMPLETE') + 
    (SELECT COUNT(pt.to_resource) 
     FROM pcs_task pt WITH (nolock) 
     WHERE p.pcs_user_id = pt.to_resource 
       AND DATEADD(DAY, 60, pt.due_date) >= pt.date_alloc 
       AND pt.due_date <= (@DateTo) 
       AND pt.date_alloc >= (@DateFrom) 
       AND pt.tsk_status <> 'COMPLETE') AS [Num OVERDUE]
FROM         
    pcs_user p
INNER JOIN 
    Users u WITH (nolock) ON u.[Reference Number] = p.obj_id
WHERE     
    (p.user_status = 'Active') 
    AND p.pcs_user_id <> 'ADMIN'
    -- AND  (u.Department in (@Department))
ORDER BY 
    u.Department    --, p.pcs_user_id

I basically have to group all departments then get SUM of Total Tasks, Num COMPLETED and num OVERDUE for the users

WHEN I add SUM function on the count, I get this error:
Sum function - cannot perform an aggregate function on an expression containing an aggregate or a subquery

k5ifujac

k5ifujac1#

DISTINCT is not a function. It works on the whole resultset, so it's not grouping correctly.

The easiest way to group this kind of data is inside an APPLY . You can use conditional aggregation to avoid all those subqueries.

Also nolock is a bad idea, it has serious data integrity implications, and it's not a go-faster switch.

DECLARE @DateFrom datetime = '2023-08-01'
DECLARE @DateTo datetime = '2023-09-27'

SELECT
  u.Department,
  pt.*
FROM         
    pcs_user p
INNER JOIN 
    Users u ON u.[Reference Number] = p.obj_id
OUTER APPLY (
    SELECT
      COUNT(pt.to_resource) AS [Total Tasks],
      COUNT(CASE WHEN pt.tsk_status =  'COMPLETE' THEN 1 END) AS [Num Completed],
      COUNT(CASE WHEN pt.tsk_status <> 'COMPLETE' THEN 1 END) AS [Num OVERDUE]
    FROM pcs_task pt
    WHERE p.pcs_user_id = pt.to_resource 
      AND DATEADD(DAY, 30, @DateTo) >= pt.date_alloc 
      AND pt.due_date <= @DateTo
      AND pt.date_alloc >= @DateFrom
) pt
WHERE p.user_status = 'Active'
  AND p.pcs_user_id <> 'ADMIN'
-- AND u.Department in (@Department)  -- why use IN??
ORDER BY 
  u.Department    --, p.pcs_user_id
;

Alternatively just use a GROUP BY on the outside

DECLARE @DateFrom datetime = '2023-08-01'
DECLARE @DateTo datetime = '2023-09-27'

SELECT
  u.Department,
  COUNT(pt.to_resource) AS [Total Tasks],
  COUNT(CASE WHEN pt.tsk_status =  'COMPLETE' THEN 1 END) AS [Num Completed],
  COUNT(CASE WHEN pt.tsk_status <> 'COMPLETE' THEN 1 END) AS [Num OVERDUE]
FROM         
    pcs_user p
INNER JOIN 
    Users u ON u.[Reference Number] = p.obj_id
LEFT JOIN pcs_task pt
  ON p.pcs_user_id = pt.to_resource 
 AND DATEADD(DAY, 30, @DateTo) >= pt.date_alloc 
 AND pt.due_date <= @DateTo
 AND pt.date_alloc >= @DateFrom
WHERE p.user_status = 'Active'
  AND p.pcs_user_id <> 'ADMIN'
-- AND u.Department in (@Department)  -- why use IN??
GROUP BY
  u.Department
ORDER BY 
  u.Department    --, p.pcs_user_id
;
weylhg0b

weylhg0b2#

Hard to say without example data, but the naive approach would be the following

WITH intermediate(department, total, completed, overdue) 
AS (SELECT ... )  --your current query as query for the CTE
SELECT department, SUM(total), SUM(completed), SUM(overdue)
FROM intermediate
GROUP BY department

Furthermore, the use of DISTINCTmay indicate some problems with your data model or your query which leads to duplicate results. But again hard to say without seeing your model or example data ...

相关问题