Closed. This question needs to be more focused . It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post .
Closed 3 days ago.
Improve this question
We have a table with student information. We attempted to obtain the total number of students as well as the total number of active students. We have several methods for calculating this, but we want the quickest method based on the server cost and execution plan.
Here is some information for your consideration.
DECLARE @dataTable AS TABLE
(
ColID INT,
ColName VARCHAR(200),
Active INT
)
INSERT INTO @dataTable (ColID, ColName, Active)
VALUES (1, 'Student ID', 0), (28, 'Student Name', 1),
(11, 'Student RollNo', 0), (31, 'Student City', 0),
(24, 'Student State', 1), (41, 'Student PhoneNo', 0),
(37, 'Student Surname', 0), (45, 'Student Division', 1)
SELECT *
FROM @dataTable
;WITH StudentCountCalculation AS
(
SELECT
ColID, ColName, Active, StudCount.TotalStudCount
FROM
@dataTable
OUTER APPLY
(SELECT COUNT(ColID) AS TotalStudCount
FROM @dataTable) StudCount
)
SELECT
ColID, ColName, Active, TotalStudCount,
(SELECT COUNT(ColID)
FROM StudentCountCalculation
WHERE Active = 1)
FROM
StudentCountCalculation
Expected result is shown here:
ColID | ColName | Active | TotalStudCount | TotalActiveStudents |
---|---|---|---|---|
1 | Student ID | 0 | 8 | 3 |
28 | Student Name | 1 | 8 | 3 |
11 | Student RollNo | 0 | 8 | 3 |
31 | Student City | 0 | 8 | 3 |
24 | Student State | 1 | 8 | 3 |
41 | Student PhoneNo | 0 | 8 | 3 |
37 | Student Surname | 0 | 8 | 3 |
45 | Student Division | 1 | 8 | 3 |
1条答案
按热度按时间7cwmlq891#
This is something that you will need to answer. Your version, settings, data, schema, hardware, etc… will affect results. Also, as the left 2 columns look like schema joined data, I am unsure of your intent.
Disclaimer: This answer assumes you are trying to count student rows while returning data. The information is still useable for general tuning.
Useful Tools: Statistics (time, io, profile), Client statistics, Execution plan
Results -- on my machine with my fake data Order of performance: 2, 3, 1
Generally, I found 2 and 3 very close with 2 offering less CPU time (statistics time -- on the messages tab). Interestingly, 2 and 3 had the same query plan for 2,000,000, but the Gather Streams (visible with Execution Plan) portion was much cheaper on query 2.
The results did bounce from time to time and the order was not always the same. Query 3 at 100,000 was using a poor query plan.
When only asking for thousands or less rows, there was no significant difference for me to choose one over the other.
Time/IO: Here you can see why query 1 is generally slower. Many more scans and reads. The query plans were also more complex.
| 2,000,000 Rows | Time | IO |
| ------------ | ------------ | ------------ |
| Query 1 | CPU time = 2906 ms, elapsed time = 7612 ms. | Scan count 26, logical reads 71430 |
| Query 2 | CPU time = 265 ms, elapsed time = 71 ms. | Scan count 9, logical reads 7143 |
| Query 2 | CPU time = 31 ms, elapsed time = 6515 ms. | Scan count 1, logical reads 7143 |
| Query 3 | CPU time = 627 ms, elapsed time = 6795 ms. | Scan count 9, logical reads 7143 |
SQL Queries: (using statistics time, io)
Fake Data: (Based on student rows; the active is random)
Closing Thoughts: Test your queries and try different combinations. If you are educated on indexes -- adjust current ones or add new ones. If you are only trying to get a small number of data/count results, pick a query for its clarity and ease of maintenance.
Lastly, if you have a totally different intent, please edit your question. Regardless, the above -- along with the comments that you have already received -- should offer insight.