SQL Server GROUP BY with multiple GROUPING SETS, CUBE, and ROLLUP clauses

eqfvzcg8  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(166)

Today I've read a tip about using multiple GROUPING SETS, CUBE, and ROLLUP clauses in the GROUP BY clause.
You can specify multiple GROUPING SETS, CUBE, and ROLLUP clauses in the GROUP BY clause separated by commas. By doing so, you achieve a multiplication effect. For example the clause CUBE(a, b, c) defines eight grouping sets and the clause ROLLUP(x, y, z) defines four grouping sets. By specifying a comma between the two, as in CUBE(a, b, c), ROLLUP(x, y, z), you multiply them and get 32 grouping sets.

Here is an example.

SELECT
    shipcountry, GROUPING(shipcountry) AS grpcountry,
    shipcity , GROUPING(shipcity) AS shipcity,
    GROUPING_ID( shipcountry, shipcity ) AS grp_id,
    COUNT(*) AS numorders
FROM Sales.Orders
GROUP BY CUBE(shipcountry,shipcity), ROLLUP(shipcountry, shipcity)

Can't wrap my head arount it. The thing is, there are a lot of duplicate rows in the result set. If we add DISTINCT to the SELECT clause the result set will be the same even without ROLLUP(shipcountry, shipcity). What's the point?

noj0wjuj

noj0wjuj1#

I think the utility comes when you specify different arguments to the cube and rollup clauses. That's even evident in whatever source you're quoting. Each of cube and rollup is just a shortcut for longer list of grouping sets. In your example, the cube defines the following grouping sets

  1. shipcountry, shipcity
  2. shipcountry
  3. shipcity
  4. (null)

Whereas the rollup specifies these sets:

  1. shipcountry, shipcity
  2. shipcountry
  3. (null)

When you specify both in the same group by clause, you're getting each set from the first paired with each set from the second (which is what the multiplicative effect from your pull quote implies). So you get (using the nomenclature "(x) + (y)" to mean "item x from the first set and item y from the second):

  1. (1) + (1) → (shipcountry, shipcity) + (shipcountry, shipcity) → (shipcountry, shipcity)
  2. (1) + (2) → (shipcountry, shipcity) + (shipcountry) → (shipcountry, shipcity)
  3. (1) + (3) → (shipcountry, shipcity) + ((null)) → (shipcountry, shipcity)
  4. (2) + (1) → (shipcountry) + (shipcountry, shipcity) → (shipcountry, shipcity)
  5. (2) + (2) → (shipcountry) + (shipcountry) → (shipcountry)
  6. (2) + (3) → (shipcountry) + ((null)) → (shipcountry)
  7. (3) + (1) → (shipcity) + (shipcountry, shipcity) → (shipcountry, shipcity)
  8. (3) + (2) → (shipcity) + (shipcountry) → (shipcountry, shipcity)
  9. (3) + (3) → (shipcity) + ((null)) → (shipcity)
  10. (4) + (1) → ((null)) + (shipcountry, shipcity) → (shipcountry, shipcity)
  11. (4) + (2) → ((null)) + (shipcountry) → (shipcountry)
  12. (4) + (3) → ((null)) + ((null)) → ((null))

As you can see, there are a lot of duplicates. For example, (shipcountry, shipcity) shows up seven times in 1, 2, 3, 4, 7, 8, and 10.

If instead you'd specified different arguments to the rollup and cube, you'd get a wholly distinct set of grouping sets.

Lastly, remember what I said above: both rollup and cube are shortcuts for commonly used patterns of grouping sets. If you only want certain grouping sets, specify only those with a grouping sets clause!

相关问题