(Print this page)

Microsoft SQL Server Grouping sets
Published date: Thursday, December 29, 2011
On: Moer and Éric Moreau's web site

There won’t be any Visual Studio programming this month. Instead, I will show you a neat Microsoft SQL Server 2008 feature that many developers (and even some DBAs) are not aware of and which come in handy very often.

Every now and then when working with SQL queries, you need to perform some kind of aggregation like sub-totals and totals at the same time you need the data. Very often I have seen (and did myself) UNION queries to achieve what I will show you here. Not only this method is not optimal, it is also more difficult to read and debug.

Over the years, many techniques have been documented to satisfy the requirements.

Requirements

The only true requirements this month is an access to a Microsoft SQL Server 2008 instance. You don’t even need a database as my script creates a test table.

Creating test data

To simplify this demonstration, I am creating a memory table containing some random test data.

Here is the script to generate data for 10 employees (numbered from 1 to 10), each selling 5 products (numbered 1 to 5), for each month of 2011. Finally, the amount field is field with a random value.

/* Create a table to contain test data */
DECLARE @TestData TABLE (
	EmployeeID	INT, 
	ProductID	INT, 
	[Date]		Date, 
	Amount		MONEY)

/* some variables needed in the loop */
DECLARE @EmployeeID INT = 1
DECLARE @ProductID INT = 1
DECLARE @month INT 
DECLARE @date DATETIME2 = GETDATE()

/* loop 10 times to create test employees */
WHILE @EmployeeID <= 10
BEGIN
	/* loop 5 times to create products */
	SET @ProductID = 1
	WHILE @ProductID <= 5
	BEGIN
		SET @month = 1
		/* loop 12 time to simulate the monthly sales by products */
		WHILE @month <= 12
		BEGIN 
			/* insert test data*/
			INSERT INTO @TestData 
			VALUES (
				@EmployeeID, 
				@ProductID,
				CAST('2011-' + CAST(@month AS VARCHAR) + '-01' AS date), 
				RAND(DATEPART(ns,@date)) * 100000
			)
			/* because @date is used in the RAND function, we need to increment it a bit */
			SET @date = DATEADD(ns, 1000, @date)
			SET @month = @month + 1
		END 
		SET @ProductID = @ProductID + 1
	END 
	SET @EmployeeID = @EmployeeID + 1
END

Testing the test table

The first query does nothing special. It simply lists the data:

/* Simple select to see what we have (remember that the Amount column is a RANDOM value) */
SELECT  EmployeeID, ProductID, Date, Amount, 'Simple Select' AS Comments 
FROM @TestData

This query should return 600 rows (10 employees * 5 products * 12 months).

Remember, because I use a memory table, the full script (including table declaration and fill) needs to run every time you want to query the table.

The old UNION method

Say for example that we want to list all the 600 detail rows and add to that some grouping totals:

  • A sub-total by Employee + Product
  • A total by Employee
  • A grand total

I am pretty sure that most of you thinking of solving this problem using 4 different queries that you UNION together in order to have all the grouping totals listed above. The query would like this:

SELECT *, 'Old-fashion query' AS Comments
FROM (
	SELECT '1 - Detail' AS RowKind, * 
	FROM @TestData
	
	UNION ALL 
	
	SELECT '2 - Sub-total by Employee + Product', EmployeeID, ProductID, NULL, SUM(Amount) 
	FROM @TestData 
	GROUP BY EmployeeID, ProductID
	
	UNION ALL 
	
	SELECT '3 - Total by Employee', EmployeeID, NULL, NULL, SUM(Amount) 
	FROM @TestData 
	GROUP BY EmployeeID
	
	UNION ALL 
	
	SELECT '4 - Grand Total', NULL, NULL, NULL, SUM(Amount) 
	FROM @TestData 
) AS A
ORDER BY ISNULL(EmployeeID, 2147483647), ISNULL(ProductID, 2147483647), ISNULL(Date, '9999-12-31')

I have added some fields (RowKind and Comments) to clearly see what we are looking at.

This query now returns 661 rows (600 details + 50 Employee-Product sub-totals + 10 Employee total + 1 grand total).

The Order By clause might seem a bit complex but it is only formatted this way to have the rows displayed in a logical order.

This query is far from optimal because the @TestData table is scanned 4 times.

The RollUp method

But there is a better/shorter syntax that was around for many years now. This other syntax requires the ROLLUP statement.

The newer query has now been shortened to this:

SELECT  EmployeeID, ProductID, Date, SUM(Amount) AS Amount
, CASE 
	WHEN Date IS NULL AND ProductID IS NULL AND EmployeeID IS NULL THEN '4 - Grand Total'
	WHEN Date IS NULL AND ProductID IS NULL THEN '3 - Total by Employee'
	WHEN Date IS NULL THEN '2 - Sub-total by Employee + Product'
	ELSE '1 - Detail'
END AS RowKind
, 'With RollUp (old syntax)' AS Comments 
FROM @TestData
GROUP BY EmployeeID, ProductID, Date WITH ROLLUP

Once again, you will get 661 rows in your result and notice that the rows are automatically sorted in a logical order. All this was done simply by adding the SUM function to the SELECT clause and by adding the GROUP BY statement (listing the 3 detail fields) and, more importantly, the WITH ROLLUP clause which will automatically create all the levels of aggregation using the fields listed in the GROUP BY clause.

If your database is set a compatibility level of 100 (or more), you could replace the last row of the previous query with this newer syntax giving you exactly the same results:

GROUP BY ROLLUP(EmployeeID, ProductID, Date)

The new Grouping Sets syntax

You can obtain exactly the same results as above using the newer Grouping Sets syntax with this query:

SELECT  EmployeeID, ProductID, Date, SUM(Amount) AS SumAmount
, CASE 
	WHEN Date IS NULL AND ProductID IS NULL AND EmployeeID IS NULL THEN '4 - Grand Total'
	WHEN Date IS NULL AND ProductID IS NULL THEN '3 - Total by Employee'
	WHEN Date IS NULL THEN '2 - Sub-total by Employee + Product'
	ELSE '1 - Detail'
END AS RowKind
, 'The new Grouping Sets syntax' AS Comments
FROM @TestData
GROUP BY 
	GROUPING SETS (
		(EmployeeID, ProductID, Date),
		(EmployeeID, ProductID),
		(EmployeeID),
		()
	)

This query will again return 661 rows exactly as the previous ones. There is no magic here. The only thing that changed is that the GROUP BY statement is now more explicit on what we want in the results.

The real power of specifying the sets is that you can have more than just the logical one you saw before. Consider this query:

SELECT  EmployeeID, ProductID, Date, SUM(Amount) AS SumAmount
, CASE 
	WHEN Date IS NULL AND ProductID IS NULL AND EmployeeID IS NULL THEN '4 - Grand Total => Grouping Sets ()'
	WHEN Date IS NULL AND ProductID IS NULL THEN '3 - Total by Employee => Grouping Sets (EmployeeID)'
	WHEN Date IS NULL AND EmployeeID IS NULL THEN '3 - Sub-total by Product  => Grouping Sets (ProductID)'
	WHEN Date IS NULL THEN '2 - Sub-total by Employee + Product  => Grouping Sets (EmployeeID, ProductID)'
	WHEN EmployeeID IS NULL THEN '2 - Sub-total by Product + Date => Grouping Sets (ProductID, Date)'
	ELSE '1 - Detail  => Grouping Sets (EmployeeID, ProductID, Date)'
END AS RowKind
, 'The new Grouping Sets syntax' AS Comments
FROM @TestData
GROUP BY 
	GROUPING SETS (
		(ProductID)
		, (ProductID, Date)
		, (EmployeeID, ProductID, Date)
		, (EmployeeID, ProductID)
		, (EmployeeID)
		, ()
	)	

This latest one now returns 726 rows:

  • 600 details
  • 50 Employee-Product sub-totals
  • 60 Product – Date sub-totals
  • 10 Employee total
  • 5 Product total
  • 1 grand total

This means that this single query returned almost all the grouping we could never imagine.

The Grouping Sets can also be used to limit the number of levels returned. Look at this query:

SELECT  ProductID, SUM(Amount) AS SumAmount
, CASE 
	WHEN ProductID IS NULL THEN '2 - Grand Total'
	ELSE '1 - Sub-Total by Product'
END AS RowKind
, 'The new Grouping Sets syntax' AS Comments
FROM @TestData
GROUP BY 
	GROUPING SETS (
		(ProductID)
		, ()
	)

This query now returns only 6 rows (5 product sub-totals + 1 grand total).

By mixing and matching the sets, chances are that you can obtain exactly the result you want and all this into a single query.

Other readings

You might want to have a look at GROUPING SETS Equivalents which list the equivalence between some Group By, Roll Up, Cube, and Grouping By syntax.

Conclusion

I agree that using the newer Grouping Sets syntax requires a few more keystrokes (compared to the RollUp syntax) but at least you can get exactly what you are looking for.

Remember that feature the next time you need to include sub-totals and totals into your queries.


(Print this page)