There won’t be any Visual Studio programming this month. Instead, I will show you a neat Microsoft SQL Server 2008 feature that many developers (and even some DBAs) are not aware of and which come in handy very often.
Every now and then when working with SQL queries, you need to perform some kind of aggregation like sub-totals and totals at the same time you need the data. Very often I have seen (and did myself) UNION queries to achieve what I will show you here. Not only this method is not optimal, it is also more difficult to read and debug.
Over the years, many techniques have been documented to satisfy the requirements.
Requirements
The only true requirements this month is an access to a Microsoft SQL Server 2008 instance. You don’t even need a database as my script creates a test table.
Creating test data
To simplify this demonstration, I am creating a memory table containing some random test data.
Here is the script to generate data for 10 employees (numbered from 1 to 10), each selling 5 products (numbered 1 to 5), for each month of 2011. Finally, the amount field is field with a random value.
/* Create a table to contain test data */ DECLARE @TestData TABLE ( EmployeeID INT, ProductID INT, [Date] Date, Amount MONEY) /* some variables needed in the loop */ DECLARE @EmployeeID INT = 1 DECLARE @ProductID INT = 1 DECLARE @month INT DECLARE @date DATETIME2 = GETDATE() /* loop 10 times to create test employees */ WHILE @EmployeeID <= 10 BEGIN /* loop 5 times to create products */ SET @ProductID = 1 WHILE @ProductID <= 5 BEGIN SET @month = 1 /* loop 12 time to simulate the monthly sales by products */ WHILE @month <= 12 BEGIN /* insert test data*/ INSERT INTO @TestData VALUES ( @EmployeeID, @ProductID, CAST('2011-' + CAST(@month AS VARCHAR) + '-01' AS date), RAND(DATEPART(ns,@date)) * 100000 ) /* because @date is used in the RAND function, we need to increment it a bit */ SET @date = DATEADD(ns, 1000, @date) SET @month = @month + 1 END SET @ProductID = @ProductID + 1 END SET @EmployeeID = @EmployeeID + 1 END
Testing the test table
The first query does nothing special. It simply lists the data:
/* Simple select to see what we have (remember that the Amount column is a RANDOM value) */ SELECT EmployeeID, ProductID, Date, Amount, 'Simple Select' AS Comments FROM @TestData
This query should return 600 rows (10 employees * 5 products * 12 months).
Remember, because I use a memory table, the full script (including table declaration and fill) needs to run every time you want to query the table.
The old UNION method
Say for example that we want to list all the 600 detail rows and add to that some grouping totals:
I am pretty sure that most of you thinking of solving this problem using 4 different queries that you UNION together in order to have all the grouping totals listed above. The query would like this:
SELECT *, 'Old-fashion query' AS Comments FROM ( SELECT '1 - Detail' AS RowKind, * FROM @TestData UNION ALL SELECT '2 - Sub-total by Employee + Product', EmployeeID, ProductID, NULL, SUM(Amount) FROM @TestData GROUP BY EmployeeID, ProductID UNION ALL SELECT '3 - Total by Employee', EmployeeID, NULL, NULL, SUM(Amount) FROM @TestData GROUP BY EmployeeID UNION ALL SELECT '4 - Grand Total', NULL, NULL, NULL, SUM(Amount) FROM @TestData ) AS A ORDER BY ISNULL(EmployeeID, 2147483647), ISNULL(ProductID, 2147483647), ISNULL(Date, '9999-12-31')
I have added some fields (RowKind and Comments) to clearly see what we are looking at.
This query now returns 661 rows (600 details + 50 Employee-Product sub-totals + 10 Employee total + 1 grand total).
The Order By clause might seem a bit complex but it is only formatted this way to have the rows displayed in a logical order.
This query is far from optimal because the @TestData table is scanned 4 times.
The RollUp method
But there is a better/shorter syntax that was around for many years now. This other syntax requires the ROLLUP statement.
The newer query has now been shortened to this:
SELECT EmployeeID, ProductID, Date, SUM(Amount) AS Amount , CASE WHEN Date IS NULL AND ProductID IS NULL AND EmployeeID IS NULL THEN '4 - Grand Total' WHEN Date IS NULL AND ProductID IS NULL THEN '3 - Total by Employee' WHEN Date IS NULL THEN '2 - Sub-total by Employee + Product' ELSE '1 - Detail' END AS RowKind , 'With RollUp (old syntax)' AS Comments FROM @TestData GROUP BY EmployeeID, ProductID, Date WITH ROLLUP
Once again, you will get 661 rows in your result and notice that the rows are automatically sorted in a logical order. All this was done simply by adding the SUM function to the SELECT clause and by adding the GROUP BY statement (listing the 3 detail fields) and, more importantly, the WITH ROLLUP clause which will automatically create all the levels of aggregation using the fields listed in the GROUP BY clause.
If your database is set a compatibility level of 100 (or more), you could replace the last row of the previous query with this newer syntax giving you exactly the same results:
GROUP BY ROLLUP(EmployeeID, ProductID, Date)
The new Grouping Sets syntax
You can obtain exactly the same results as above using the newer Grouping Sets syntax with this query:
SELECT EmployeeID, ProductID, Date, SUM(Amount) AS SumAmount , CASE WHEN Date IS NULL AND ProductID IS NULL AND EmployeeID IS NULL THEN '4 - Grand Total' WHEN Date IS NULL AND ProductID IS NULL THEN '3 - Total by Employee' WHEN Date IS NULL THEN '2 - Sub-total by Employee + Product' ELSE '1 - Detail' END AS RowKind , 'The new Grouping Sets syntax' AS Comments FROM @TestData GROUP BY GROUPING SETS ( (EmployeeID, ProductID, Date), (EmployeeID, ProductID), (EmployeeID), () )
This query will again return 661 rows exactly as the previous ones. There is no magic here. The only thing that changed is that the GROUP BY statement is now more explicit on what we want in the results.
The real power of specifying the sets is that you can have more than just the logical one you saw before. Consider this query:
SELECT EmployeeID, ProductID, Date, SUM(Amount) AS SumAmount , CASE WHEN Date IS NULL AND ProductID IS NULL AND EmployeeID IS NULL THEN '4 - Grand Total => Grouping Sets ()' WHEN Date IS NULL AND ProductID IS NULL THEN '3 - Total by Employee => Grouping Sets (EmployeeID)' WHEN Date IS NULL AND EmployeeID IS NULL THEN '3 - Sub-total by Product => Grouping Sets (ProductID)' WHEN Date IS NULL THEN '2 - Sub-total by Employee + Product => Grouping Sets (EmployeeID, ProductID)' WHEN EmployeeID IS NULL THEN '2 - Sub-total by Product + Date => Grouping Sets (ProductID, Date)' ELSE '1 - Detail => Grouping Sets (EmployeeID, ProductID, Date)' END AS RowKind , 'The new Grouping Sets syntax' AS Comments FROM @TestData GROUP BY GROUPING SETS ( (ProductID) , (ProductID, Date) , (EmployeeID, ProductID, Date) , (EmployeeID, ProductID) , (EmployeeID) , () )
This latest one now returns 726 rows:
This means that this single query returned almost all the grouping we could never imagine.
The Grouping Sets can also be used to limit the number of levels returned. Look at this query:
SELECT ProductID, SUM(Amount) AS SumAmount , CASE WHEN ProductID IS NULL THEN '2 - Grand Total' ELSE '1 - Sub-Total by Product' END AS RowKind , 'The new Grouping Sets syntax' AS Comments FROM @TestData GROUP BY GROUPING SETS ( (ProductID) , () )
This query now returns only 6 rows (5 product sub-totals + 1 grand total).
By mixing and matching the sets, chances are that you can obtain exactly the result you want and all this into a single query.
Other readings
You might want to have a look at GROUPING SETS Equivalents which list the equivalence between some Group By, Roll Up, Cube, and Grouping By syntax.
Conclusion
I agree that using the newer Grouping Sets syntax requires a few more keystrokes (compared to the RollUp syntax) but at least you can get exactly what you are looking for.
Remember that feature the next time you need to include sub-totals and totals into your queries.